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VARIANTS OF LAV VIRUSES , THEIR DNA- AND PROTEIN- 
COMPONENTS AND THEIR USES, PARTICULARLY FOR DIAGNOSTIC 
PURPOSES AND FOR THE PREPARATION 

OF IMMUNOGENIC COMPOSITIONS 
The present invention relates to viruses ca- 
pable of inducing lymphadenopathies (denoted below by 
the abbreviation LAS) acquired immuno-depressive syn- 
dromes (denoted below by the abbreviation AIDS), to 
antigens of said viruses, particularly in a purified 
form, and to processes for producing these antigens, 
particularly antigens of the envelopes of these viruses. 
The invention also relates to polypeptides, whether gly- 
cosylated or not, encoded by said DNA sequences. 

The invention also relates to cloned DNA se- 

15 quences hybridizable to genomic RNA and DNA of the new 
lymphadenopathy associated viruses (LAV) disclosed here- 
after, to processes for their preparation and their 
uses. It relates more particularly to stable probes in- 
cluding a DNA sequence which can be used for the detec- 

20 tion of the new LAV viruses or related viruses or DNA 
proviruses in any medium, particularly biological, sam- 
ples, containing of any them. 

An important genetic polymorphism has been re- 
cognized for the human retrovirus at the origin of the 

25 acquired immune deficiency syndrome (AIDS) and other 
diseases, .like lymphadenopathy syndrome (LAS), AIDS- 
related complex (ARC) and probably some encephalopathies 
(for review see Weiss, 1984). Indeed all of the isolates 
analyzed until now have a distinct restriction map, even 

3q if recovered from the same place and time (BENN et al . , 
1985). Identical restriction maps have only been 
observed for the first two isolates designated 
lymphadenopathy-associated virus, LAV (ALI20N et al . , 
1984) and human T-cell lymphotropic virus type 3, HTLV-3 

35 (HAHN et al., 1984) and thus appears as an exception. 
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The genetic polymorphism of the AIDS virus was better 
assessed after the determination of the complete nucleo- 
tide sequence of LAV (WAIN-HOBSON et al . , 1985), HTLV-3 
(RATNER et al . , 1985 ; MUESING et al . , 1985) and of a 
5 third isolate designated AIDS-associated retrovirus, ARV 
( SANCHEZ - PESCADOR et. al . , 1985). In particular it ap- 
peared that, besides the nucleic acid variations respon- 
sible for the restriction map polymorphism, isolates 
could differ significantly at the protein level, espe- 
10 cially in the envelope (up to 13 % of difference between 
ARV and LAV), by both amino-acids substitutions and re- 
ciprocal insertions-deletions (RABSON and MARTIN, 1985). 

Nevertheless the differences mentioned above 
do not go as far as to destroy a level of immunological 
15 relationship sufficient, as evidenced by the capabili- 
ties of similar proteins, i. e. core proteins of similar 
nature, such as the p25 proteins, or of similar envelope 
glycoproteins, such as the 110-120 kD glycoproteins, to 
immunologically cross-react. Accordingly the proteins of 
20 an y of said LAV viruses can be used for the in vitro de- 
tection of antibodies induced in vivo and present in 
biological fluids obtained from individuals infected 
with the other LAV variants. Therefore these viruses are 
grouped in a class of LAV viruses, hereafter generally 
25 said to belong to the class of LAV-1 viruses. 

The invention stems 'from the discovery of new 
viruses which although held as responsible of diseases 
which are clinically related to AIDS and still belonging 
to the class of "LAV-1 viruses", differ genetically to a 
30 much larger extent from the above mentioned LAV va- 
riants . 

The new viruses are basically characterized by 
the DNA sequences which are shown in Figures 7A to 7J 
(LAV ELI } and fi ^ures 8A to 81 UAVM^) respectively. 
35 The invention further relates to variants of 
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the new viruses the RNAs of which or the related cDNAs 

derived from said RNAs are hybridizable to corresponding 

parts of the cDNAs of either LAV-. _ or LAV MAr . 

ELI MAL 

The invention also relates to the DNAs them- 
selves of said viruses, including DNA fragments derived 
therefrom hybridizable with the genomic RNA of either 
LAV^j or LAVjgj^L- Particularly said DNAs consist of said 
cDNAs or cDNA fragments or of recombinant DNAs contai- 
ning said cDNAs or cDNA fragments. 

It further relates to DNA recombinants contai- 
ning DNAs or cDNA fragments of either LAV^j or. LAV ^ 
or of related viruses. It is of course understood that 
fragments which would include some deletions or muta- 
tions which would not substantially alter their capabi- 
15 lity of also hybridizing with the retroviral genomes of 
liAVg^j or LAV MAL are to be considered as forming obvious 
equivalents of the DNAs or DNA fragments more specifi- 
cally referred to hereabove. 

The invention also relates more specifically 
2 q to cloned probes which can be made starting from any DNA 
fragment according to the invention, thus to recombinant 
DNAs containing such fragments, particularly any plas- 
mids amplifiable in procaryotic or eucaryotic cells and 
carrying said fragments. 
25 Using the cloned DNA containing a DNA fragment 

of LAV £LI or of LAV MAL as a molecular hybridization pro- 
be - either by marking with radionucleotides or with 
fluorescent reagents - LAV virion RNA may be detected 
directly e. g. in the blood, body fluids and blood pro- 
3q ducts (e.g. of the antihemophylic factors such as Factor 
VIII concentrates). A suitable method for achieving that 
detection comprises immobilizing virus onto said a sup- 
port e.g. nitrocellulose filters, etc., disrupting the 
virion and hybridizing with labelled ( radiolabelled or 
35 "cold" fluorescent- or enzyme-labelled) probes. Such 
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an approach has already been developed for Hepatitis B 

virus in peripheral blood (according to SCOTTO J. et al . 

Hepatology (1983), 3, 379-384). 

Probes according to the invention can also be 

^ used for rapid screening of genomic DNA derived from the 

tissue of patients with LAV related symptoms, to see if 

the proviral DNA or RNA present in host tissue and other 

tissues are related to LA V_ or LAV. _ _ . 

ELI MAL 

A method which can be used for such screening 
1Q comprise the following steps : extraction of DNA from 
tissue, restriction enzyme cleavage of said DNA, elec- 
trophoresis of the fragments and Southern blotting of 
genomic DNA from tissues, subsequent hybridization with 
labelled cloned LAV provival DNA. Hybridization in situ 
15 can also be used. 

Lymphatic fluids and tissues and other non- 
lymphatic tissues of humans, primates and other mamma- 
lian species can also be screened to see if other evo- 
lutionnary related retrovirus exist. The methods 
20 referred hereabove can be used, although hybridiza- 

tion and washings would be done under non stringent 
conditions . 

The DNA according to the invention can be used 
also for achieving the expression of LAV viral antigens 
25 for diagnostic purposes as well as far the production of 
a vaccine against LAV. Fragments of particular advantage 
in that respect will be discussed later. 

The methods which can be used are multifold : 

a) DNA can be transfected into mammalian cells 
30 with appropriate selection markers by a variety of tec- 
hniques, calcium phosphate precipitation, polyethylene 
glycol, protoplast-fusion, etc . . m 

b) DNA fragments corresponding to genes can be 
cloned into expression vectors for E . coli , yeast- or 

35 mammalian cells and the resultant proteins purified. 
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c) The provival DNA can be M shot-gunned" 
(fragmented) into procaryotic expression vectors to ge- 
nerate fusion polypeptides- Recombinant producing anti- 
genically competent fusion proteins can be identified by 
5 simply screening the recombinants with antibodies 
against LAV £LI or LAV antigens. 

Particular reference in that respect is made 

to those portions of the genomas of LAV T and LAV W , „ 

EL I MAL 

which r in the drawings, are shown to belong to open 
10 reading frames and which encode the products having the 
polypeptidic backbones shown. 

More particularly, the invention relates to 
the different polypeptides which appear in figures 7A 
to 8l. Methods disclosed in European application 
15 0 178 978 and in PCT application PCT/EP 85/00548 filed 
on Oct. 18, 1985 are applicable for the production of 
such peptides from the corresponding viruses. 

The present invention further aims at provi- 
ding polypeptides containing sequences in common with 
20 polypeptides comprising antigenic determinants included 

in the proteins encoded and expressed by the LAV _ or 

ELI 

of LAV MAL genome. An additional object of the invention 
is to further provide means for the detection of pro- 
teins related to these LAV viruses, particularly for the 

25 diagnosis of AIDS or pre-AIDS or, to the contrary, for 
the detection of antibodies against the LAV virus or 
proteins related therewith, particularly in patients 
afflicted with AIDS or pre-AIDS or more generally in 
asymtomatic carriers and in blood-related products. 

30 finally the invention also aims at providing immunogenic 
polypeptides, and 4 more particularly protective polypep- 
tides for use in the preparation of vaccine compositions 
against AIDS or related syndroms. 

The invention relates also to polypeptide 

35 fragments having lower molecular weights and having 
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peptide sequences or fragments in common with those 
shown in figures 7A to 81. Fragments of smaller sizes 
may be obtained by resorting to known techniques. For 
instance such a method comprises cleaving the original 
5 larger polypeptide by enzymes capable of cleaving it at 
specific sites. By way of examples of such proteins, may 
be mentioned the enzyme of Staphylococcus aureus V8 , 
a-chymotrypsine, "mouse sub-maxillary gland protease" 
marketed by the BOEHRINGER company, Vibrio alginolvticus 

10 qhemovar j,ophagus collagenase, which specifically re- 
cognizes said peptides Gly-Pro and Gly-Ala, etc. 

Other features of this invention will appear 
in the following disclosure of the data obtained 
starting from LAV £LI and LAV MAL , in relation to the 

15 drawings in which : 

- Figs 1A and 1B provide restriction maps of the genomas 

of LAV ELI and LAV M&L as com P are ^ to I»AV BHJ (a known LAV 
isolate deposited at CNCM under number 1-232 on July 
15th, 1983) ; 

20 - Fig. 2 shows the comparative maps setting forth the 
relative positions of the open reading frames of the 
above genomas ; 

- Figs. 3A-3F (sometimes also designated globally here- 
after by fig.. 3) indicate the relative correspondance 

25 between the proteins (or glycoproteins) encoded by the 
open reading frames, whereby aminoacid residues of 
protein sequences of LAV and LAV are in vertical 
alignment wxth corresponding amfnoacid residues 
(numbered) of corresponding or homologous proteins or 

30 glycoproteins of LAV - 

- Figs. 4A-4B (someSmes also designated globally here- 
after by fig. 4) provide for quantitation of the se- 
quence divergence between homologous proteins of LAV 
LAV £LI and LAV ; BRU ' 

35 - Fig. 5 shows diagrammatical ly the degree of divergence 
of the different virus enveloppe proteins ; 

- Figs. 6A and 6B (or Fig. 6 when viewed altogether) 
render apparent the direct repeats which appear in the 
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proteins of the different AIDS virus isolates. 

- Figs. 7A-7J and 8A-8I show the full nucleotidic 

sequences of LAV_ _ and LAV._ T respectively. 

RESULTS 

5 Characterization and molecular cloning of two 

African isolates. 

The different AIDS virus isolates concerned 
are designated by three letters of the patients name, 
LAV BRU refeRrin 9 to the prototype AIDS virus isolated in 
10 1983 from a French homosexual patient with LAS and 
thought to have been infected in USA in the preceding 
years (Barre-Sinoussi et al . , 1983). Both of the African 
patients originated from Zaire ; LAV^j was recovered in 
1983 from a 24 year old woman with AIDS, and LAV-— T in 

MAL 

15 1985 from a 7 year old boy with ARC, probably infected 

in 1981 after a blood-transf usion in Zaire, since his 

parents were LAV-seronegative . 

Recovery and purification of each of the two 

viruses were performed according to the method disclosed 
20 in European Patent Aplication 84 401834/138 667 filed on 

September 9, 1984. 

LAV ELI and LAV MAL are i nd i st i n 9uishable from 
the previously characterized isolates by their struc- 
tural and biological properties in vitro. Virus meta- 

25 bolic labelling and immune precipitation by patients ELI 
and MAL sera, as well as reference sera, showed that the 
proteins of LAV £LI and LAV MAt had the same molecular 
weight (MW) and cross-reacted immunologically with those 
of prototype AIDS virus (data not shown) of the "LAV 1 M 

30 class. 

Reference is again made to European Applic- 
ation 178 978 and International Application PCT/EP 
85/00548 as concerns the purification, mapping and 
sequencing procedures used herein. See also "experi- 
mental procedures* 1 and "legends of the figures" here- 
35 after. 
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Primary restriction enzyme analysis of LAV 

E LI 

and ^AV MAL genomes was done by southern blot with total 
DNA derived from acutely infected lymphocytes, using 
cloned LAV BR(J complete genome as probe. Overall cross- 
5 hybridization -was observed under stringent conditions, 
but the restriction profiles of the Zairian isolates 
were clearly different. Phage lambda clones carrying the 
complete viral genetic information were obtained and 
further characterized by restriction mapping and nucleo- 
10 tide sequence analysis ; clone E-H1_2.is derived from 
LAV ELI infected cells and contains an integrated 
provirus with 5' flanking cellular sequences but a 
truncated 3' long terminal repeat (LTR) ; clone M-H 11 
was obtained by complete Hindlll restriction of DNA from 
15 LAV^^infected cells, taking advantage of the existence 
of a unique Hind-III site in the LTR. M-H 11 is thus 
probably derived from unintegrated viral DNA since that 
species was at least ten times more abundant than 
integrated provirus . 
20 Figure 1B gives a comparaison of the restric- 

tion maps of LAV £LI , LAV^ and prototype LAV BRQ| all 
three being derived from their nucleotide sequences, as 
well of three Zairian isolates previously mapped for 
seven restriction enzymes (Benn et al . , 1985). Despite 
25 this limited number, all of the profiles are clearly 

different (out of the 23 sites making up the map of 
BR U 

LAV only seven are present in all six maps 

presented), confirming the genetic polymorphism of the 
AIDS virus . No obvious relationship is apparent between 
30 the five Zairian maps, and all of their common sites are 
also found in LAV QRU . 

Conservation of the genetic organization. 



The genetic organization of LAV W and 

ELI 



LAV 



MAL 



as deduced from the complete nucleotide sequences of 
35 their cloned genomes is identical to that found in other 
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isolates, i.e. 5 ' gag-pol-central region-env-F3 ; . Most 
noticeable is the conservation of the "central region" 
(fig. 2), located between the pol and env genes, which 
is composed of a series of overlapping open reading 
5 frames (orf) we had previously designated Q, R, S, T, 
and U after observing a similar organization in the 
ovine lentivirus visna (Sonigo et al . , 1985). The 
product of orf S (also designated "tat") is implicated 
in the transactivation of virus expression (Sodroski et 

1Q al., 1985 ; Arya et al . , 1985) ; the biological role of 
the product of orf Q (also designated "sor" or orf A) is 
still unknown (Lee et al . , 1986 ; kang et al . , 1986). Of 
the three other orfs (R, T, and U), only orf R is likely 
to be a seventh viral gene, for the following reasons : 

15 the exa:ct conservation of its relative position with 
respect to Q and S (fig. 2), the constant presence of a 
possible splice acceptor and of a consensus AUG 
initiator codon, its similar codon usage with respect to 
viral genes, and finally the fact that the variation of 

20 its protein sequence within the different isolates is 
comparable to that of gag, pol and Q (see fig. 4).. 

Also conserved are the sizes of the U3, R and 
U5 elements of the LTR (data not shown), the location 
and sequence of their regulatory elements such as TATA 

25 box and AATAAA polyadenylation signal, and their 

flanking sequences i.e. primer binding site (PBS) 

complementary to 3' end of tRNA LYS and polypurine tract 

3 

(PPT) . Most of the genetic variability within the LTR is 
located in the 5' half of U3 (which encodes a part of 

30 

orf F) while the 3* end of U3 and R, which carry most of 
the cis-acting regulatory elements : promoter, enhancer 
and trans-activating factor receptor (Rosen et al . , 
1985), as well as the U5 element are well-conserved. 

Overall, it clearly appears that the Zairian 
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isolates belong to the same type of retrovirus as the 
previously sequenced isolates of American or European 
origin . 

Variability of tbe viral proteins . 

5 Despite their identical genetic organization, 

these isolates show substantial differences in the 
primary structure of their proteins. The amino acid 
sequences of LAV ELI and LAV MAL proteins are presented in 
figures 3A-3F (to be examined in conjunction with Figs. 
10 7A-7J and 8A-8I), aligned with those of LAV n and ARV 

dR U 

2. Their divergence was quantified as the percentage of 
amino-acids substitutions in two-by-two alignments (Fig. 
4). We have also scored the number of insertions and 
deletions that had to be introduced in each of these 

-j 5 alignments. 

Three general observations can be made. First, 
the protein sequences of the African isolates are more 
divergent from ^AV fiRu than are those of HTLV-3 and ARV 2 
(Fig. 4A) ; similar results are obtained if ARV 2 is 

20 taken as reference (not shown) . The range of genetic 
polymorphism between isolates of the AIDS virus is 
considerably greater than previously observed. Second, 
our two sequences confirm that the envelope is more 
variable than the gag and pol genes. Here again, the 

25 relatively small difference observed between the env of 
LAV BRU and HTLV " 3 appears as- an exception. Third, the 
mutual divergence of the two African isolates (Fig. 4B) 
is comparable to that between LAV^^ and either of them; 
as far as we can extrapolate from only three sequenced 

30 isolates from the USA and Europe and two from Africa, 
this is indicative of a wider evolution of the AIDS 
virus in Africa. 

qaq and — pol : Their greater degree of conservation 
compared to the envelope is consistent with their 
35 encoding important structural or enzymatic activities. 
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Of the three mature gag proteins, the p25 which was the 
first recognized immunogenic protein of LAV (Barre- 
Sinoussi • et al., 1983) is also the better conserved 
(fig. 3). In gag and pol, differences between isolates 
5 are principally due to point mutations, and only a small 
number of insertional or deletional events is observed. 
Among these, we must note the presence in the over- 
lapping part of gag and pol of LAV^ RU of an insertion of 
12 aminoacids (AA) which is encoded by the second copy 

10 of a 36 bp direct repeat present only in this isolate 
and in HTLV-3. This duplication was omitted because of a 
computing error in the published sequence of LAV fiRU 
(position 1712, Wain-Hobson et al., 1985) but was indeed 
present in the HTLV-3 sequences (Ratner et al . , 1985 ; 

15 Muesing et al . , 1985). 

env : Three segments can be distinguished in the 
envelope glycoprotein precursor (Allan et al., 1985 ; 
Montagnier et al., 1985 ; DiMarzoVeronese et al . , 1985). 
The first is the signal peptide (positions 1-33 in fig. 

20 3), and its sequence appears as variable ; the second, 
segment (pos. 34-530) forms the outer membrane protein 
(OMP or gp110) and carries most of the genetic 
variations, and in particular almost all of. the numerous 
reciprocal insertions and deletions ; the third, segment 

25 (531-877) is separated from -the OMP by a potential 
cleavage site following a^. constant basic stretch 
( Arg-Glu-Lys-Arg) and forms the transmembrane protein 
(TMP or gp 41) responsible for the anchorage of the 
envelope glycoprotein in the cellular membrane. A better 

30 conservation of the TMP than the OMP has also been 
observed between the different murine leukemia viruses 
(MLV, Koch et al . , 1983), and could be due to structural 
constraints . 

From the alignment of figure 3 and the 
35 graphical representation of the envelope variability 
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shown in figure 5 ; we clearly see the existence of 
conserved domains, with little or no genetic variation, 
and hypervariable domains/ in which even the alignment 
of the different sequences is very difficult, because of 
the existence of a large number of mutations and of 
reciprocal insertions and deletions. We have not 
included the sequence of the envelope of the HTLV-3 
isolate since it so close to that of LAV QRU (cf . fig. 
4), even in the hypervariable domains, that it did not 
add anything to the analysis. While this graphical 
representation will be refined by more sequence data, 
the general profile is already apparent , with three 
hypervariable domains (Hyl, 2 and 3) all being located 
in the OMP, and separated by three well-conserved 
stretches (residues 37-130, 211-289, and 438-530 of fig. 
3 alignment) probably associated with important biolo- 
gical functions. 

In spite of the extreme genetic variability, 
the folding pattern of the envelope glycoprotein is 
probably constant. Indeed the position of virtually all 
of the cysteine residues is conserved within the 
different isolates (fig. 3 and 5), and the only three 
variable cysteines fall either in the signal peptide or 
in the very C-terminal part of the TMP . The hyper- 
variable domains of the OMP are bounded by conserved 
cysteines, suggesting that they may represent loops 
attached to the common folding pattern. Also the 
calculated hydropathic profiles (Kyte and Doolittle, 
1982) of the different envelope proteins are remarkably 
conserved (not shown). 

About half of the potential N-glycosylaticn 
sites, Asn-X-Ser./Thr , found in the envelopes of the 
Zairian isolates map to the same positions in LAV 

ER U 

(17/26 for LAV ELI and 17/28 for LAV f/]AL ) . The other sites 
35 appear to fail within variable domains of env, 
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suggesting the existence of differences in the extent of 
envelope glycosylation between different isolates. 
Other viral proteins : Of the three other identified 
viral proteins, the p27 encoded by orf F, 3' of env 
(Allan et al . , 1985b) is the most variable (fig. 4). The 
proteins encoded by orfs Q and S of the central region 
are remarkable by their absence of insertions/deletions. 
Surprisingly, a high frequency of arainoacids substitu- 
tions, comparable to that observed in env, is found for 
the product of orf S (trans-activating factor). On the 
other hand, the protein encoded by orf Q is no more 
variable than gag. Also noticeable is the lower 
variation of the proteins encoded by the central regions 

of LAV_ TT and LAV M . T . 
ELI MAL 

15 DISCUSSION 

With the availability of the complete nucleo- 
tide sequence from . five independant isolates, some 
general features of the AIDS virus genetic variability 
are now emerging. Firstly, its principal cause are point 
mutations very often resulting in amino-acid substitu- 
tions, and which are more frequent in the 3' part of the 
genome (orf S, env and orf F) . Like all RNA viruses, the 
retroviruses are thought to be highly subject to 
mutations caused by errors of the RNA polymerases during 
their replication, since there is no proofreading, of 
this step (Holland et al . , 1982 ; Steinhauer and 
Holland, 1986). 

Another source of genetic diversity are 
insertions/deletions. From the figure. 3 alignments, 
insertional events seem to be implicated in most of the 
cases, since otherwise deletions should have occurred in 
independant isolates at the precisely the same location. 
Furthermore, upon analyzing these insertions, we have 
observed that they most often represent one of the two 
35 copies of a direct repeat (fig. 6). Some are perfectly 
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conserved like the 36 bp repeat in the gag-pol overlap 

of LAV BRa (fig. 6-a) ; others carry point mutations 

resulting in aminoacid substitutions, and as a 

consequence, they are more difficult to observe, though 

clearly present, in the hypervariable domains of env 

(cf. fig. 6-g and -h) . As noted for point mutations, env 

gene and orf F also appear as more susceptible to that 

form of genetic ariation than the rest of the genome. 

The degree of conservation of these repeats must be 

related to their date of occurrence in the analyzed 

sequences : the more degenerated, the more ancient. A 

very recent divergence of LAV fiRU and HTLV3 is suggested 

by with extremely low number of mismatched AA between 

their homologous proteins. However, one of- the LAV ' 

BR U 

repeats (located in the Hyl domain of env, fig. 6-f) is 
not present in HTLV3 , indicating that this generation of 
tandem repeats is a rapid source of genetic diversity. We 
have found no traces of such a phenomenon, even when 
comparing very closely related viruses, such as the 
Mason-Pfizer monkey virus, MPMV (Sonigo et al . , 1986), 
and an immunosuppressive simian virus, SRV-1 (Power et 
al., 1986). Insertion or deletion of one copy of a 
direct repeat have been occasionally reported in mutant 
retroviruses (Shimotohno and Temin, 1981 ; Darlix, 
1986), but the extent at which we observe this pheno- 
menon is unprecedented. 

The molecular basis of 
these duplications is unclear, but could be the "copy- 
choice" phenomenon, resulting from the diploidy of the 
30 retroviral genome (Varmus and Swanstrom, 1984 ; Clark 
and Mak, 1983). During the synthesis of the first-strand 
of the viral DNA, jumps are known to occur from one RNA 
molecule to another, especially when a break or a stable 
secondary structure is present on the template ; an 
inaccurate re-initiation on the other RNA template could 
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result in the generation (or the elimination) of a short 
direct repeat. 

Genetic variability, and subsequent antigenic 
modifications, have often been developed by micro- 
^ organisms as a means to escape the host's immune res- 
ponse, either by modifying their epitopes during the 
course of the infection, as in trypanosomes (Borst and 
Cross, 1982), or by generating a large repertoire of 
antigens, as observed in influenza virus (Webster et 

1Q al., 1982). As the human AIDS virus is related to animal 
lentiviruses (Sonigo et al . , 1985 ; Chiu et al . , 1985), 
its genetic variability could be a source of antigenic 
variation, as can be observed during the course of the 
infection by the ovine lentivirus visna (Scott et al . / 

15 1979 ; Clements et al., 1980) or by the equine infec- 
tious anemia virus (EIAV, Montelaro et al., 1984). 
However, a major discrepancy with these animal models is 
the extremely low, if any, neutralizing activity of the 
sera of individuals infected by the AIDS virus, whether 

20 the y are healthy carriers, displaying minor symptoms or 
afflicted with AIDS (Weiss et al . , 1985" ; Clavel, et 
al., 1985). Furthermore, even for the visna virus the 
exact role of antigenic variation in the pathogenesis is 
unclear (Thormar et al . , 1983 ; Lutley et al . , 1983). We 

25 rather feel that genetic variation represents a general 
selective advantage for lentiviruses by allowing an 
adaptation to different environments, for example by 
modifying their tissue or host tropisras. In the particu- 
lar case of the AIDS virus, rapid genetic variations are 

30 tolerated, especially in the envelope ; they could allow 
the virus to get adapted to different "micro-environ- 
ments " of the membrane of their principal target cells, 
namely the T4 lymphocytes. These "micro-environments " 
could result from the immediate vicinity of the virus 
35 receptor to polymorphic surface proteins, diff erring 
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either between individuals or betwwen clones of 
lymphocytes . 

Conserved domains in the AIDS v irus envelope. 

Since the proteins of most of the isolates are 
antigenically cross-reactive, the genotypic differences 
do not seem to affect the sensitivity of actual diagnos- 
tic tests, based upon the detection of antibodies to the 
AIDS virus and using purified virions as antigens. They 
nevertheless have to be considered for the development 
of the "second-generation" tests, that are expected to 
be more specific, and will use smaller synthetic or 
genetically-engineered viral antigens. The identifi- 
cation of conserved domains in the highly immunogenic 
envelope glycoprotein, and also the core structural 
proteins (gag) f is very important for_ these tests. The 
conserved stretch found at the end of the OMP and the 
beginning of the TMP (490-620, fig. 3) could be a good 
candidate, since a bacterial fusion protein containing 
this domain * was well-detected by AIDS patients sera 
(Chang et al . , 1985) . 

The envelope, specifically the OMP, mediates 
the interaction between a retrovirus and its specific 
cellular receptor (DeLarco and Todaro, 1976 ; Robinson 
et al., 1980). In the case of the AIDS virus, in vitro 
binding assays have shown the interaction of the 
envelope glycoprotein gp110 with the T4 cellular surface 
antigen (McDougal et al . , 1986), already thought to be, 
or closely associated to, the virus receptor (Xlatzmann 
et al.,. 1984 ; Dagleish et al . , 1984). Identification of 
the AIDS virus envelope domains that are responsible for 
this interaction (receptor-binding domains) appears as 
fundamental for understanding of the host-viral 
interactions, but also for designing a protective 
vaccine, since an immune response against these epitopes 
could possibly elicit neutralizing antibodies. As the 
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AIDS virus receptor is at least partly formed of a 
constant structure, the T4 antigen, the binding site of 
the envelope is unlikely to be exclusively encoded by 
domains undergoing drastic genetic changes between 
5 isolates, even if these could be implicated in some kind 
of an "adaptation". One, or several of the conserved 
domains of the OMP (residues 37-130, 211-289, and 
488-530 of fig. 3 alignment) brought together by the 
folding of the protein, must play a part in the virus- 

10 rece Ptor interaction, and this can. be explored with 
synthetic or genetically-engineered peptides derived 
from these domains, either by direct binding assays, or 
indirectly by assaying the neutralizing activity of 
specific antibodies raised against them. 

15 African AIDS viruses 

Zaire and the neighbouring countries of 
Central Africa are considered as an area of endemic for 
the AIDS virus infection, and the possibility that the 
virus has emerged in Africa has became a subject of 

20 intense controversy (see Norman, 1985). From the present 
study, it is clear that the genetic organization of 
Zairian isolates is the same as that of american 
isolates, thereby indicating a common origin. The very 
important sequence differences observed between the 

25 proteins are consistent with a divergent evolutionary 
process. In addition, the two African isolates are 
mutually more divergent than the American isolates 
already analyzed « as far as that observation can be 
extrapolated, it suggests a longer evolution of the 

30 virus in Africa, and is also consistent witlv the fact 
that a larger fraction of the population is exposed than 
in developed countries. 

A novel human retrovirus with morphology and 
biologocal properties ( cytopathogenici ty , T4 tropism) 

35 similar to those of LAV, but nevertheless clearly 
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genetically and antigenically distinct from that latter, 
was recently isolated from two patients with AIDS 
originating from Guinea Bissau, West-Africa (Clavel et 
al . , 1986). In the neighbouring Senegal the population 
5 seems exposed to a retrovirus also distinct from LAV, 
but apparently non pathogenic (Barin et al . , 1985 ; 
Kanki et al . , 1986). 3oth of these novel African 
retroviruses seem to be antigenically related to the 
simian T-cell lymphotropic virus, STLV-III, shown to be 

1Q widely present in healthy African green monkeys and 
other simian species (Kanki et al . 1985). This raises 
the possibility of a large group of African primate 
lentiviruses , ranging from the apparently non-pathogenic 
simian viruses to the LAV-type viruses. Their precise 

15 relationship will only be known after their complete 
genetic characterization, but it is already very likely 
that they have evolved from a common progenitor. The 
important genetic variability we have observed between 
isolates of the AIDS virus in Central Africa is probably 

2 Q a hallmark of this entire group, and may account for the 
.apparently important genetic divergence between its 
members (loss of cross-antigenicity in the envelopes). 
In this sense the conservation of the tropism for the T4 
lymphocytes suggests that it is a major advantage 
25 a< 3uired by these retroviruses. 
EXPERIMENTAL PROCEDURES 

Virus isolations 

LAV ELI and LAV MAL were isolat ed from the 
peripheral blood lymphocytes of the patients as des- 

30 cribed (Barre-Sinoussi et al . , 1983) ; briefly, the 
lymphocytes were fractionated and co-cultivated with 
phytohaemagglutinin-stimulated normal human lymphocytes 
in the presence of interleukin 2 and anti-alpha inter- 
feron serum. Viral production was assessed by cell-free 

35 reverse transcriptase (RT) activity assay in the 
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cultures and by electron microscopy. 
Molecular cloning 

Normal donor lymphocytes were acutely infected 
4 c 
(10 cpm of RT activity/ 10 cells) as described (Barre- 

5 Sinoussi et al . , 1983), and total DNA was extracted at 

the beginning of the RT activity peak. For LAV , a 

lambda library using the L47-1 vector (Loenen and 

Brammar, 1982) was constructed by partial Hindlll 

digestion of the DNA as already described (Alizon et 

10 al., 1984). For LAV^, DNA from infected cells was 
digested to completion with Hindlll and the 9-10kb 
fraction was selected on 0.8 % low melting point agarose 
gel and ligated into L47-1 Hindlll arms. About 5.10 5 
plaques for LAV £LI and 2.10 5 for LAV^^ , obtained by in 

15 vitro packaging (Amersham) were plated on coli LA101 
and screened in situ under stringent conditions , using 
the 9 kb SacI insert of the clone lambda .J 19 (Alizon et 
al., 1984) carrying most of the LAV genome as probe. 

BR U 

Clones displaying positive signals were plaque-purified 

20 and propagated on coli C600 recBC, and two recombi- 
nant phages carrying the complete genetic information of 
LAV ELI < E ~ H12 ) a nd LAV^ L (M-H11) were further charac- 
terized by restriction mapping. 

Hucleotide sequence strategy 

25 Viral fragments derived from E-H12 and M-H11 

were sequenced by the dideoxy chain terminator procedure 
(Sanger et al. , 1977) after "shotgun" cloning in the 
M13mp8 vector (Messing and Viera, 1982), as previously 
described (Sonigoet al . , 1985). The viral genome of 

30 LAV ELI is 9176 nucleotides," that of LAV MAL 9229 nucleo- 
tides long. Each nucleotide was determined from more than 
5 independent clones on average. Complete nucleotide 
sequences are not presented in this article for obvious 
reasons of space limitation but are freely available 

35 upon request to the authors, until they are released 
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through sequence data banks. 
LEGEND OF THE FIGURES 

Figure 1 : Restriction map analysis of AIDS virus 
isolates; 

A/ Restriction maps of the inserts of phage 
lambda clones derived from cells infected with LAV ELI 
(E-H12) and LAV^^ (M-H11). The schematic 

genetic organization of the AIDS virus has been drawn 
above the maps. The LTRs are indicated by solid boxes. 
A: Aval -B: Bam HI-Bg : Bglll-E : EcoRI - H:HindIII - Hc:HincII 
K:KpnI-N:Nde I-P : Pstl-S : Sacl-X :X bal . Asterisks 
indicate the Hindlll cloning sites in lambda L47-1 
vector . 

B/Comparison of the sites for seven restric- 
15 tion enzymes in six isolates : the prototype AIDS virus 

LAV BRU' LAV MAL and LAV ELI ; 21 ' Z2 ' 23 are Zair ian 
isolates with published restriction maps (Benn et al . , 

1985). Restriction sites are represented by the 

following symbols : Bglll ; EcoRI ; Hindi ; Hindlll ; 

2Q Kpnl ; Ndel ; Sad. 

Figure 2 : Conservation of the genetic organization of 
the central region in AIDS virus isolates. 
Stop codons in each phse are represented as 
vertical bars. Vertical arrows indicate possible AUG 

25 initiation codons. Splice, acceptor (A) and donor (D) 
sites identified in subgenomic viral mRNA (Muesing et 
al., 1985) are shown below the graphic of LAV BRU , and 
corresponding sites in L A V^. ^ and LAW^^ are indicated. 
PPT indicates the repeat of the polypurine tract flan- 

3Q king the 3 LTR . As observed in LAV ^ ^ (Wain-Hobson et 
al., 1985), the PPT is repeated 256 nucleotides 5' to 
the end of the pol gene in both our sequences, but this 
repeat is degenerated at two positions in LAV 
Figure 3 : Alignment of the protein sequences of four 

35 AIDS virus isolates. 
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Isolate L AV gRU (Wain-Hobson et al . , 1985) is 

taken as reference ; only differences with LAV are 

BRU 

noted for ARV2 ( Sanchez-Pescador et al., 1985) and the 
two Zairian isolates LAV^^ and LAV^j. A minimal number 
5 of gaps (-) was introduced in the alignments. The NH^- 
termini of p25 gag and P 18 gag are indicated (Sanchez- 
Pescador, 1985). The potential cleavage sites in the 
envelope precursor (Allan et al . , 1985a ; diMarzo- 
Veronese, 1985) separating the signal peptide (SP), the 

1Q outer membrane protein (OMP) and the transmembrane 
protein (TMP) are indicated as vertical arrows ; 
conserved cysteines are indicated by black circles and 
variable cysteines are boxed. The one letter code for 
amino acids is : A : Ala ; C:Cys ; D:Asp ; E:Glu ; F:Phe ^ 

15 G : Gly ; H : His ; I:Ile ; K : Lys ; L : Leu ; M:Met ; N : Asn ; 
P:Pro ; Q:Gln ; R:Arg ; S : Ser ; T : Thr ; V:Val; W:Trp ; 

Y:Tyr . 

Figure 4 : Quantitation of the sequence divergence 
between homologous proteins of different 

20 isolates. 

Part A of each table gives results deduced from 
two-by-two a lignments using the proteins of LAV QRU as 
reference, part B those of LAV £LI as reference . 
Sources: Muesing et al., 1985 for HTLV-3 ; Sanchez- 

25 Pescador et al., 1985 for ARV 2 and Wain-Hobson et al., 
1985 for LAV BRU . For each case of the tables, the size 
in amino-acids of the protein (calculated from the first 
methionine residue, or from the beginning of the orf for 
Pol) is given at the upper left part. Below are given 

30 the nuinber of deletions (left) and insertions (right) 
necessary for the alignment. The large numbers in bold 
face represent the percentage of amino-acids substitu- 
tions (insertions/deletions being excluded). Two by two 
alignments were done with computer assistance Wilburg 

35 and Lipman, 1983), using a gag penalty of 1, K-tuple of 
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1 , and window of 20, except for the hypervariable 
domains of env, where the number of gaps was made 
minimum, and which are essentially aligned as shown in 
fig. 3. The sequence of the predicted protein encoded by 
5 orf R of HTLV-3 has not been compared because of a pre- 
mature termination relative to all other isolates. 
Figure 5 : Variability of the AIDS virus envelope 
protein. 

For each position x,of the alignment of env 
10 (Fig. 3), variability V(x) was calculated as 

number of different amino-acids at position x 

V(x) = 

frequency of the most abundant amino-acid at 
position x. 

Gaps in the alignments are considered as 
15 another amino-acid. For an alignment of 4 proteins, V(x) 
ranges from 1 (identical AA in the 4 sequences) to 16 (4 
different AA) . This type of representation has previous- 
ly been used in a compilation of the AA sequence of 
immunoglobulins variable regions (Wu and Rabat, 1970). 
20 Vertical arrows indicate the cleavage sites ; asterisks 
represent potential N-glysosylation sites (N-X-S/T) 
conserved in all four isolates ; black triangles re- 
present conserved cysteine residues. Black lozanges mark 
the three major hydrophobic domains. OMP : outermembrane 
25 protein ; TMP : transmembrane protein ; signal : signal 
peptide ; Hyl, 2, 3, : hypervariable domains. 
Figure 6 : Direct repeats in the proteins of different 
AIDS virus isolates. 
These examples are derived from the aligned 
30 sequences of gag (a, b) , F (c,d) an env (e ; f, g, h) 
shown in figure 3. The two elements of the direct repeat 
are. boxed, while degenerated positions are underlined. 

The invention thus pertains more specifically 
to the proteins, . polypeptides or glycoproteins including 
35 the polypeptidic strucutres shown in the drawings. The 
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first and last amino-acid residues of these proteins, 
polypeptides or glycoproteins carry numbers computed 
from a first aminoacid of the open-reading frames 
concerned, although these numbers dc not correspond 
5 exactly to those of the LAV £LI or LAV^ proteins 
concerned, rather to those of the LAV fiRu corresponding 
proteins or sequences shown in figs. 3A, 3B and 3C. 
Thus a number corresponding to a "first amino-acid 
residue" of a LAV^^ protein corresponds to the number 

1Q of the first amino-acyl residue of .the corresponding 
LAV BRU P rotein which, in any of figs. 3A, 3B or 3C is in 
direct alignment with the corresponding first amino-acid 
of the LAV^j protein. Thus the sequences concerned can 
be read f rom figs. 7A-7J and 8A-8I, to the extent where 

15 they do not appear with sufficient clarity from Figs. 
3A-3F. 

The preferred protein sequences of this 
invention extend from the corresponding "first" and 
"last" amino-acid residues (reference is also made to 
2Q the protein(s)- or glycoprotein ( s ) -portions including 
part of the sequences which follow : 

OMP or gp110 proteins, including precursors : 
1 to 530 

OMP or gp110 without precursor : 
25 34-530 

Sequence carrying the; TMP or gp41 protein : 
531 -8 77 , particularly 
680-700 

well conserved stretches of CMP : 
30 37-130, 

211-289 and 
. 488-530 

well conserved stretch found at the end of the OMP and 
the beginning of TMP : 
35 490-620. 
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Proteins containing or consisting of the "well 

conserved stretches "are of particular interest for the 

production of immunogenic compositions and (preferably 

in relation to the stretches of the env protein) of 

5 vaccine compositions against the LAV-viruses of class 1 

as above-defined. 

The invention concerns more particularly all* 

the DNA fragments which have been more specifically 

referred to in the drawings and which correspond to open 

1Q reading frames. It will be understood^ that the man 

skilled in the art will be able to obtain them all, for 

instance by cleaving an entire DNA corresponding to the 

complete genome of either LAV-, or of LAV-„ _ . such as 

ELI MAL 

by cleavage by a partial or complet digestion thereof 
15 "ith a suitable restriction enzyme and by the subsequent 
recovery of the relevant fragments. The different DNAs 
disclosed above can be resorted to also as a source of 
suitable fragments. The techniques disclosed in PCT 
application for the isolation of the fragments which can 
20 then be included in suitable plasmids are applicable 
here too. 

Of course other methods can be used. Some of 
them have been examplified in European Application Nr. 
178,978 filed on September 17 , 1985. Reference is for 
25 instance made to the following methods. 

a) DNA can be transfected into mammalian cells 
with appropriate selection markers by a variety of tech- 
niques, calcium phosphate precipitation, polyethylene 
glycol, protoplast-fusion, etc.. 
30 k) DNA fragments corresponding to genes can be 

cloned into expression vectors for E . coli, yeast- or 
mammalian cells and the resultant proteins purified. 

c) The provival DNA can be "shot-gunned" 
(fragmented) into procaryotic expression vectors to 
35 generate fusion polypeptides. .Recombinant producing 
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antigenically competent fusion proteins can be identi- 
fied by simply screening the recombinants with 
antibodies against LAV antigens. 

The invention further refers more specifically 
5 to DNA recombinants, particularly modified vectors, 
including any of the preceding DNA sequences and adapted 
to transform corresponding microorganisms or cells, 
particularly eucaryotic cells such as yeasts, for 
instance saccharomyces cerevisiae, or higher eucaryotic 

10 cells, particularly cells of mammals, and to permit ex- 
pression of said DNA sequences in the corresponding 
-microorganisms or cells. General methods of that type 
have been recalled in the abovesaid PCT international 
patent aplication PCT/EP 85/00548 filed on October 18, 

15 1985 - 

More particularly the invention relates to 
such modified DNA recombinant vectors modified by the 
abovesaid DNA sequences and which are capable of trans- 
forming higher eucaryotic cells particularly mammalian 

2Q cells. Preferably any of the abovesaid sequences are 
placed under the direct control of a promoter contained 
in said vectors and which is recognized by the poly- 
merases of said cells, such that the first nucleotide 
codons expressed correspond to the first triplets of the 

25 above-defined DNA-sequences . Accordingly this invention 
also relates to the corresponding DNA fragments which 
can be obtained from genomas of LAV £LI or LAV or 
corresponding cDNAs by any appropriate method. For 
instance such a method comprises cleaving said LAV ge- 

30 nomas or cDNAs by restriction enzymes preferably at the 
level of restriction sites surrounding said fragments 
and close to the opposite extremities respectively 
thereof, recovering and identifying the fragments sought 
according to sizes, if need be checking their restric- 

35 tion maps or nucleotide sequences (or by reaction with 
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monoclonal antibodies specifically directed against 
epitopes carried by the polypeptides encoded by said DNA 
fragments), and further if need be, trimming the 
extremities of the fragment, for instance by an 
5 exonucleolytic enzyme such as Bal31, for the purpose of 
controlling the desired nucleotide-sequences of the ex- 
tremities of said DNA fragments or, conversely, 
repairing said extremities with Klenow enzyme and 
possibly, ligating the latter to synthetic polynucleotide 
10 fra< ? ments designed to permit the recqnstitution of the 
nucleotide extremities of said fragments. Those frag- 
x ments may then be inserted in any of said vectors for 
causing the expression of the corresponding polypeptide 
by the cell transformed therewith. The corresponding 
15 polypeptide can then be recovered from the transformed 
cells, if need be after lysis thereof, and purified, by 
methods such as electrophoresis. Needless to say that 
all conventional methods for performing these operations 
can be resorted to. 

20 The invention also relates more specifically 

to cloned probes which can be made starting from any DNA 
fragment according to this invention, thus to recombi- 
nant DNAs containing such fragments, particularly any 
plasmids amplifiable in procaryotic or eucaryotic cells 

25 and carr y in 9 said fragments. 

Using the cloned DNA fragments as a molecular 
hybridization probe - either by labelling with radio- 
nucleotides or with fluorescent reagents - LAV virion 
RNA may be detected directly in the blood, body fluids 

3Q and blood products (e.g. of the. antihemophilic factors 
such as Factor VIII concentrates) and vaccines, i.e. 
hepatitis B vaccine It has already been shown that, whole 
virus can be detected in culture supernatants of LAV 
producing cells. A suitable method for achieving that 
35 detection comprises immobilizing virus onto a support, 
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e.g. nitrocellulose filters, etc., disrupting the virion 

and hybridizing with labelled ( radiolabeled or "cold" 

fluorescent- or enzyme-labelled) probes. Such an 

approach has already been developed for Hepatitis B 

virus in peripheral blood (according to SCOTTO J , et al . 

Hepatology (1983), 3, 379-384). 

Probes according to the invention can also be 

used for rapid screening of genomic DNA derived from the 

tissue of patients with LAV related symptoms, to see if 

the proviral DNA or RNA present in host tissue and other 

tissues can be related to that of L AV_ _ _ or LAV W 

ELI MAL 

A method which can be used for such screening 
comprises the following steps : extraction of DNA from 
tissue, restriction enzyme cleavage of said DNA , elec- 
15 trophoresis of the fragments and Southern blotting of 
genomic DNA from tissues, subsequent hybridization with 
labelled cloned LAV proviral DNA. Hybridization in situ 
can also be used. 

Lymphatic fluids and tissues and other non- 
20 lymphatic . tissues of humans, primates and other mamma- 
lian species can also be screened to see if other 
evolutionnary related retrovirus exist. The methods 
referred to hereabove can be used, although hybridi- 
zation and washings would'be done under non stringent 
25 conditions. 

The DNAs or DNA fragments according to the 
invention can be used also for achieving the expression 
of viral antigens of LAV^ or ^^ MAL for diagnostic 
. purposes. 

30 The invention relates generally to the poly- 

peptides themselves, whether synthetized chemically 
isolated from viral preparation or expressed by the 
different DNAs of the inventions, particularly by the 
ORFs or fragments thereof, in appropriate hosts, 

35 particularly procaryotic or eucaryotic hosts, after 
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transformation thereof with a suitable vector previously 
modified by the corresponding DNAs . 

More generally, the invention also relates to 
any of the polypeptide fragments (or molecules, parti- 
5 cularly glycoproteins having the same polypeptidic 
backbone as the polypeptides mentioned hereabove) 
bearing an epitope characteristic of a protein or 
glycoprotein of LAV^j or ^^ mL r which polypeptide or 
molecule then has N-terminal and C-terminal extremities 

10 respectively either free or, independently from each 
other, covalently bond to aminoacids other than those 
which are normally associated with them in the larger 
polypeptides or glycoproteins of the LAV virus, which 
last mentioned aminoacids are then free or belong to 

15 another polypeptidic sequence. Particularly the 
invention relates to hybrid polypeptides containing any 
of the epitope-bearing-polypeptides which have been 
defined more specifically hereabove, recombined with 
other polypeptides fragments normally foreign to the LAV 

20 P ro " te ins, having sizes sufficient to provide for an 
increased immunogenicity of the epitope-bearing-poly- 
peptide yet, said foreign polypeptide fragments either 
being imraunogenically inert or not interfering with the 
immunogenic properties of the epitope-bearing-poly- 

25 peptide. 

Such hybrid polypeptides which may contain 
from 5 up to 150, even 250 aminoacids usually consist of 
the expression products of a vector which contained ab 
initio a nucleic acid sequence expressible under the 

30 contro1 of a suitable promoter or replicon in a suitable 
host, which nucleic acid sequence had however beforehand 
been modified by insertion therein of a DNA sequence 
encoding said epitope-bearing-polypeptide . 

Said epitope-bearing-polypeptides, particular- 

35 ly those whose N-terminal and C-terminal aminoacids are 
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free, are also accessible by chemical synthesis, accord- 
ing to technics well known in the chemistry of proteins. 

The synthesis of peptides in homogeneous 
solution and in solid phase is well known. 
5 In this, respect, recourse may be had to the 

method of synthesis in homogeneous solution described by 
Houbenweyl in the work entitled "Methoden der Orga- 
nischen Chemie" (Methods of Organic Chemistry) edited by 
E. WUNSCH., vol. 15-1 and II, THIEME, Stuttgart 1974. 
10 This method of synthesis _ consists of 

successively condensing either the successive aminoacids 
in twos, in the appropriate order or successive peptide 
fragments previously available or formed and containing 
already several aminoacyl residues in the appropriate 
15 order respectively. Except for the carboxyl and amino- 
groups which will be engaged in the formation of the 
peptide bonds, care must be taken to protect beforehand 
all other reactive groups borne by these aminoacyl 
groups or f ragments'.However, prior to the formation of the 
20 peptide bonds, the carboxyl groups are advantageously 
activated, according to methods well known in the syn- 
thesis of peptides. Alternatively, recourse may be had 
to coupling reactions bringing into play conventional 
coupling reagents, for instance of the carbodiimide 
25 type, such as 1 -ethyl-3- ( 3-dimethyl-aminopropyl ) -carbo- 
diimide. When the aminoacid group used carries an 
additional amine- group (e.g. lysine) or another acid 
function (e.g. glutamic acid), these groups may be 
protected by carbobenzoxy or t-butyloxycarbonyl groups, 
30 as regards the amine groups, or by t-butylester groups, 
as regards the carbcxylic groups. Similar procedures are 
available for the protection of other reactive groups, 
for example, SH group (e.g. in cysteine) can be 

protected by an acetamidomethyi or paramethoxybenzyl 
35 group. 



WO 87/07906 



PCT/EP87/00326 



30 

In the case of progressive synthesis, amino- 
acid by aminoacid, the synthesis starts preferably by 
the condensation of the C-terminal aminoacid with the 
aminoacid which corresponds to the neighboring aminoacyl 
5 group in the desired sequence and so on, step by step, 
up to the N-terminal aminoacid. Another preferred tech- 
nique can be relied upon is that described by R.D. 
Merrifield in "solid phase peptide synthesis" (J. Am. 
Chem. Soc, 45, 2149-2154). 

10 In accordance with the Merrifield process, the 

first C-terminal aminoacid of the chain is fixed to a 
suitable porous polymeric resin, by means of its carbo- 
xylic group, the amino group of said aminoacid then 
being protected, for example by a t-butyloxycarbonyl 

15 group. 

When the first C-terminal aminoacid is thus 
fixed to the resin, the protective group of the amine 
group is removed by washing the resin with an acid, i.e. 
trif luoroacetic acid, when the protective group of the 

20 amine group is a t-butyloxycarbonyl group. 

Then the carboxylic group of the second 
aminoacid which is to provide the second aminoacyl group 
of the desired peptidic sequence, is coupled to the 
deprotected amine group of the C-terminal aminoacid 

25 fixed to the resin. Preferably, the carboxyl group of 
this second aminoacid has been activated, for example by 
dicyclohexyl-carbodiimide, while its amine group has 
been protected, for example by a t-butyloxycarbonyl 
group. The first part of the desired peptide chain, 

3Q which comprising the first two aminoacids, is thus 
obtained. As previously, the amine group is then de- 
protected, and one can further proceed with the fixing 
of the next aminoacyl group and so forth until the whole 
peptide sought is obtained . 

35 The protective groups of the different side 
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groups, if any, of the peptide chain so formed can then 
be removed. The peptide sought can then be detached from 
the resin, for example, by means of hydrofluoric acid, 
and finally recovered in pure form from the acid 
5 solution according to conventional procedures. 

As regards the peptide sequences of smallest 
size and bearing an epitope or immunogenic determinant, 
and more particularly those which. are readily accessible 
by chemical synthesis, it may be required, in order to 
10 increase their in vivo immunogenic character, to couple 
or "conjugate" them covalently to a physiologically 
acceptable and non toxic carrier molecule. 

By way of examples of carrier molecules or 
macromolecular supports which can be used for making the 
15 conjugates according to the invention, will be mentioned 
natural proteins, such as tetanic toxoid, ovalbumin, 
serum-albumins, hemocyanins, etc .. Synthetic macromole- 
cular carriers, for example polysines or poly(D-L- 
alanine)-poly(L-lysine)s, can be used too. 
20 Other types of macromolecular carriers which 

can be used, which generally have molecular weights 
higher than 20,000, are known from the literature. 

The conjugates can be synthesized by known 
processes, such as described by Frantz and Robertson in 
25 "Infect. and Immunity", 33, 193-198 (1981), or by P.E. 
Kauffman in "Applied and Environmental Microbiology", 
October 1981 Vol. 42, n* 4, 611-614. 

For instance the following coupling agents can 
be used : glutaric aldehyde, ethyl chlorof ormate, 
3Q water-soluble carbodiimides (N-ethyl-N ' ( 3-dimethylamino- 
propyl) carbodiimide, HC1), diisocyanates , bis-diazoben- 
zidine, di- and trichloro-s-triazines , cyanogen bromides, 
benzaquinone, as well as coupling agents mentioned in 
"Scand. J. Immunol., 1978, vol. 8, p. 7-23 (Avrameas, 
35 Ternynck, Guesdon) . 
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Any coupling process can be used for bonding 
one or several reactive groups of the peptide, on the 
one hand, and one or several reactive groups of the 
carrier, on the other hand. Again coupling is advanta- 
geously achieved between carboxyl and amine groups 
carried by the peptide and the carrier or vice-versa in 
the presence of a coupling agent of the type used in 
protein synthesis, i.e. 1 -ethyl-3- ( 3-dimethylaminopro- 
py 1 ) -carbodiimide , N-hydroxybenzotriazole , etc . . 

Coupling between amine groups respectively borne by the 
peptide and the carrier can also be made with gluta- 
raldehyde, for instance, according to the method des- 
cribed by BOQUET, P. et al . (1982) Molec . Immunol., 19, 
1441-1549, when the carrier is hemocyanin. 
15 The immunogenicity of epitope-bearing-peptides 

can also be reinforced, by oligomerisation thereof, for 
example in the presence of glutaraldehyde or any other 
suitable coupling agent. In particular, the invention 
relates to the water soluble immunogenic oligomers thus 
obtained, comprising particularly from 2 to 10 monomer 
units . 

The glycoproteins, proteins and polypeptides 
(generally designated hereafter as "antigens" of this 
invention, whether obtained (by methods such as dis- 
closed in the earlier patent applications referred to 
above) in a purified state from LAV^^ or LAV^^ virus 
preparations or - as concerns more particularly the 
peptides - by chemical synthesis, are useful in pro- 
cesses for the detection of the presence of anti-LAV 
antibodies in biological media, particularly biological 
fluids such as sera from man or animal, particularly 
with a view of possibly diagnosing LAS or AIDS. 

Particularly the invention relates to an in 
vitro process of diagnosis making use of an envelope 
35 glycoprotein (or of a polypeptide bearing an epitope of 
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this glycoprotein of LAV £LI or LAV MAL for the detection 
of anti-LAV antibodies in the serums of persons who 
carry them. Other polypeptides - particular those 
carrying an epitope. of a core protein - can be used too. 
5 A preferred embodiment of the process of the 

invention comprises : 

- depositing a predetermined amount of one or several of 
said antigens in the cups of a titration microplate ; 

- introducing of increasing dilutions of the biological 
10 fluid, i.e. serum to be diagnosed into these cups ; 

- incubating the microplate • 

- washing carefully the microplate with an appropriate 
buffer ; 

adding into the cups specific labelled antibodies 
15 directed against blood immunoglobulins and 

- detecting the antigen-antibody-complex formed, which 
is then indicative of the presence of LAV antibodies in 
the biological fluid. 

Advantageously the labelling of the anti- 
20 immunoglobulin antibodies is achieved by an enzyme 
selected from among those which are capable of hydro- 
lysing a substrate, which substrate undergoes a modi- 
fication of its radiation-absorption, at least within a 
predetermined band of wavelenghts. The detection of the 
25 substrate, preferably comparatively with respect to a 
control, then provides a measurement of the potential 
risks or of the effective presence of the disease. 

Thus preferred methods immuno-enzymatic or 
also immunof luorescent detections, in particular 
30 according to the EL ISA technique. Titrations may be 
determinations by immunofluorescence or direct or 
indirect immuno-enzymatic determinations. Quantitative 
titrations of antibodies on the serums studied can be 
made . 

35 Th ^ invention also relates to the diagnostic 
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kits themselves for the in vitro detection of antibodies 
against the LAV virus, which kits comprise any of the 
polypeptides identified herein, and all the biological 
, and chemical reagents, as well as equipment, necessary 
5 for peforming diagnostic assays. Preferred kits comprise 
all reagents required for carrying out ELISA assays. 
Thus preferred kits will include, in addition to any of 
said polypeptides, suitable buffers and anti-human 
immunoglobulins, which anti-human immunoglobulins are 
1Q labelled either by an immunof luorescent molecule or by an 
enzyme. In the last instance preferred kits then also 
comprise a substrate hydrolysable by the enzyme and 
providing a signal, particularly modified absorption of 
a radiation, at least in a determined wavelength, which 
-j 5 signal is then indicative of the presence of antibody in 
the biological fluid to be assayed with said kit. 

It can of course be of advantage to use seve- 
ral proteins or polypeptides not only of both LA.V and 

ELiI 

LAV MAL' but also of an y or both of them together with 

2o homologous proteins or polypeptides of earlier described 
viruses, e.g. of I<AV BRU or HTLV^ or ARV, etc.. 

The invention also relates to vaccine composi- 
tions whose active principle is to be constituted by any 
of the antigen, i.e. the" hereabove disclosed polypeptide 

25 whole antigens, of either LAV £LI or LAV^ L , or both, 
particularly the purified gp110 or immunogenic fragments 
thereof, fusion polypeptides or oligopeptides in asso- 
ciation with a suitable pharmaceutical or physiolo- 
gically acceptable carrier. 

30 A first type of preferred active principle is 

the gp110 immunogen of said iramunogens . 

Other preferred active principles to be con- 
sidered in that fields consist of the peptides con- 
taining less than 250 aminoacid units, preferably less 

35 than 150, particularly from 5 to 150 aminocid residues, 
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as deducible for the complete genomas of LAV £LI and 
LAV MAL and even more P referab ly those peptides which contain 
one or more groups selected from Asn-X-Ser and Asn-X-Ser 
as defined above. Preferred peptides for use in the 
5 production of vaccinating principles are peptides (a) to 
(f) as defined above. By way of example having no 
limitative character, there may be mentioned that 
suitable dosages of the vaccine compositions are those 
which are effective to elicit antibodies in vivo , in the 
1Q host, particularly a human host. Suitable doses range 
from 10 to 500 micrograms of polypeptide, protein or 
glycoprotein per kg, for instance 50 to 100 micrograms 
per kg. 

The different peptides according to this in- 

15 vention can also be used themselves for the production 
of antibodies, preferably monoclonal antibodies specific 
of the different peptides respectively. For the produc- 
tion of hybridomas secreting said monoclonal antibodies, 
conventional production and screening methods are used. 

20 These monoclonal antibodies, which themselves are part 
of the invention then provide very useful tools for the 
identification and even determination of relative 
proportions of the different polypeptides or proteins in 
biological samples, particularly human samples contain- 

25 ing LAV or related viruses. 

The invention further relates to the hosts 
(procaryotic or eucaryotic cells) which are transformed 
by the above mentioned recombinants and which are 
capable of expressing said DNA fragments. 

30 Finally the invention also concerns vectors 

for the transformation fo eucaryotic cells of human 
origin, particularly lymphocytes, the polymerase of 
which are capable of recognizing the LTRs of LAV. 
Particularly said vectors are characterized by the 

35 presence of a LAV LTR therein, said LTR being then 
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active as a promoter enabling the efficient transcrip- 
tion and translation in a suitable host of a DNA insert 
coding for a determined protein placed under its 
controls. 

5 Needless to say that the invention extends to 

all variants of genomes and corresponding DNA fragments 
(ORFs) having substantially equivalent properties, all 
of said genomes belonging to retroviruses which can be 
considered as equivalents of LAV. 

10 It must be understood that the claims which 

follow are also intended to cover all equivalents of the 
products (glycoproteins, polypeptides, DNAs , etc..) 
whereby an equivalent is a product, i.e. a polypeptide 
which may distinguish from a determined one defined in 

^ any of said claims, say through one or several amino- 
acids, while still having substantially the same 
immunological or immunogenic properties. A similar rule 
of equivalency shall apply to the DNAs, it being 
understood that the rule of equivalency will then be tied 

2Q to the rule of equivalency pertaining to the polypeptides 
which they encode. 

It will also be understood that all the 
litterature referred to hereinbefore or hereinafter, and 
all patent applications or patents not specifically 

25 identified herein but which form counterparts of those 
specifically designated herein must be considered as 
incorporated herein by reference. 

It should further be mentioned that the 
invention further relates to immunogenic compositions 

3Q containing preferably not only any of the polypepdides 
more specifically identified above and which have the 
aminoacid-sequences of LAV__. and LAV which have been 

r-L I MA L 

identified, but corresponding peptidic sequences to 
previously defined LAV proteins too. 
35 In that respect the invention relates more 
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particularly to the particular polypeptides which have 
the sequences corresponding more specifically to the 
LAV BRU sequences which have been referred to earlier, 
i.e. the sequences extending between the following first 
5 and last aminoacids, of the LAV fiRU proteins themselves, 
i.e. the polypeptides having sequences contained in the 

LAV BRU 0MP or LAV BRU ™ P or se( 5 uences extending over 
both, particularly those extending from between the 
following positions of the aminoacids included in the 
10 env open reading frame of the LAV BRU genome, 

1-530 
34-530 
and more preferably - 

531-877, particularly 
15 680-700 

37-130 
211-289 
488-530 
490-620. 

20 These different sequences can be used for any 

of the above defined purposes and in any of the compo- 
sitions which have been disclosed. 

Finally the invention also relates to the 
different antibodies which can be formed specifically 

25 against the different peptides which have been disclosed 
herein, particularly to the monoclonal antibodies which 
recognize them specifically. The corresponding hybri- 
domas which can be formed starting from spleen cells 
previously immunized with such peptides which are fused 

30 with appropriate myeloma cells and selected according to 
standard procedures also form part of the invention. 

Phage A clone E-H12 derived from LAV 

ELI 

infected cells has been deposited at the "Collection 
Nationale des Cultures de Micro-organisraes " (National 
35 Collection of Cultures of Microorganisms) (CNCM) of the 
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Pasteur Institute of Paris France, under n* 1-550 on May 
9th, 1986. 

Phage X clone M-H11 derived from LAV W „ . 

MAL 

infected cells has been deposited at the CNCM under n* 
5 . 1-551 on May 9th, 1986. 
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We claim : 

1. The virus LAV £LI whose RNA corresponds to 
the cDNA of figs. 7A-7J. 

2. The virus LAV MAL whose RNA corresponds to 
5 the cDNA of .figs. 8A-8I. 

3. The cDNA of figs. 7A-7J or parts thereof. 

4. The cDNA of figs. 8A-8I or parts thereof. 

5. DNA recombinants containing at least part 
of the cDNA of claim 3 or 4 . 

10 6. A probe .containing a cloned nucleic acid 

according to any of claims 3 to 5 . 

7. The method for identifying the presence in 
a host tissue of a virus or provirus related to either 
LAV £ 

Li 02r LAV MAL w hich comprises, hybridizing DNA 
15 obtained from said tissue with a probe according to 
claim 6 and detecting the presence of said virus or 
provirus in said tissue according as hybridization with 
said probe is detected or not. 

8. A peptide, protein, or parts thereof en- 
2o coded by open reading frames of the DNA sequences of 

claim 3 or 4 or fragments thereof. 

9. A peptide of claim 8 which corresponds to 
any of the stretches extending respectively 

from aminoacyl residue 37 to aminoacyl residue 130, 
25 or from aminoacyl residue 211 to aminoacyl residue 289, 
or from aminoacyl residue 488 to aminoacyl residue 530, 
of fig. 3. 

10.. A peptide of claim 8 which corresponds to 
the stretch extending f rem the aminoacyl residue 490 to 
30 the aminocyl residue 620 of fig. 3 

11. - A portion of a protein or glycoprotein 
whose aminoacid sequence includes all or part of the 
sequences which follow : 



35 



I 
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OMP or gp110 proteins, including precursors : , 
1 to 530 

OMP or gp110 without precursor : 
34-530 

5 Sequence carrying the TMP or gp41 protein •: 

531-877, particularly 
680-700 

well conserved stretches of OMP : 

37-130, 

10 211-289 and 

488-530 

well conserved stretch found at the end of the OMP and 
the' beginning of TMP : 

490-620. 

15 12. A method for the in vitro detection of the 

presence of antibodies directed against LAV_ ^ or LAV 

ELI MAL 

or against related viruses in human body fluids which 
comprises contacting said body fluids with antigens 
obtained from the viruses of claims 1 or 2 or consisting 
20 of peptides according to any of claims 8 to 10 and 
detecting the immunological reaction between said anti- 
gens and said antibodies. 

13. The method claim 11 which comprises : 

- depositing a predetermined amount of one or several of 
25 said antigens in the cups of a titration microplate ; 

- introducing of increasing dilutions of the biological 
fluid, i.e. serum to be diagnosed into these cups ; 

- incubating the microplate ; j 

- washing carefully the microplate with an appropriate 
30 buffer ; 

- adding into the cups specific labelled antibodies 
directed against blood immunoglobulins and 

- detecting the antigen-antibody-complex formed, which 
is then indicative of the presence of LAV antibodies in 

35 the biological fluid. 
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14. A diagnostic kit for the in vitro de- 
tection of antibodies against the viruses of claims 1 or 
2 or viruses related therewith, which contain an antigen 
obtained from said viruses or consisting of a peptide 

5 according to any of claims 8 to 11, and the biological 
and chemical reagents, as well as equipment, necessary 
for performing diagnostic assays. 

15. An immunogenic composition containing an 
antigen of the viruses of claim 1 or 2, or both or of 

10 any immunogenic peptide encoded by- the RNAs of said 
viruses or by part thereof in association with a 
suitable pharmaceutical^ or physiologically acceptable 
carrier. 

16. The immunogenic composition of claim 15 
wherein said peptide is the gp110 enveloppe glycoprotein 
or part thereof. 

17. The immunogenic composition of claim 16 
which contains a protein or glycoprotein whose aminoacid 
sequence includes all or part of any of the sequences 
which • follow : 

OMP or gp110 proteins, including precursors : 
1 to 530 

OMP or gp110 without precursor : 
34-530 

25 Sequence carrying the TMP or gp41 protein : 

531-877, particularly 
680-700 

well conserved stretches of OMP : 

37-130, 

30 211-289 and 

488-530 

well conserved stretch found at the end of the OMP and 
the beginning of TMP : 

490-620. 

35 
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18. The antibodies, particularly monoclonal 
antibodies, formed against any of the peptides, proteins 
or glycoproteins of any of claims 8 to 11. 

19. The cells transformed with a DNA recombi- 
5 nant according to claim 5 ; 
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LAV . EL I 



IgGTCTCTCTGG TTAGACCAGATTTGAGCCTGCGAGCTCTCTGGCTAGC7AGGGAACCCAC 

tgcttaagcctcaataaagcttgccttgagtgcttcaaIgtagtgtgtgcccgtctg.tgt 

100 

GTGACTCTGGTAACTAGAGATCCCTCAGACCCCTTTAGTCAGAGTGGAAAATCTCTAGCA N 

U ^3GCGCCCGAACAGGGACCTGAAAGCGAAAGTAGAACCAGAGGAGCTCTCTCGACGCAG 

200 . 

G/CTCGGCTTGCTGAAGCGCGCACGGCAAGAGGCGAG66GCA6C6ACTGGTSAGTACGCT 

(-VGAG . 300 

'MetGlyAlaArgAlaSerVilLeuSer 
AAAATTTTTGACTAGCGGAGGCTAGAAGGAGAGAGATGGGTGCGAGAGCGTCAGTATTAA 

Gl-'GlvLvsLeuAspLysTrpGluLysIleArgLeuAr 3 ProGlyGlyLysLysLysTyt 
GCGGGGGAAAATTAGATAAATGGGAAAAAATTCGGTTACGGCC AGGAGGAAAGAAAAAAT 

400 

Ar-LeuLysEisIleValTroAlaSerAr S GluLeuGluArgTyrAlaLeuAsnProGly 
ATAGACTAAAACATATAGTATGGGCAAGCAGGGAGCTAGAACGATATGCACTTAATCCTG 

LeuLeuGluThrSerGluGlyCysLysGlnllelleGlyGlnLeuGlnProAlalleGln 
GCCTTTTAGAAACATCAGAAGGCTGTAAACAAATAATAGGGCAGCTACAACCAGC lATTC 

500 . - - * 

ThrGlyThrGiuGluLeuArgSerLeuTyrAsnThrValAlaThrLeuTyrCysValllis 

AGACAGGAACAGAAGAACTTAGATCATTAT ATAATAC AGTAGCAACCCTCTATTGTGTAC 

• ■ oOO 

LysGlylieAspValLysAspThrLysGluAlaLeuGluLyslietGluGluGluGlnAsn 
ATAAAGGAATAGATGTAAAAGACACCAAGGAAG CTTT AGAAAAG ATGGAGGAAGAGCAAA 

LysSerLysLysLysAlaGlnGlnAlaAlaAlaAspThrGlyAsnAsnSerGlnValSer 
AC AAAAGTAAGAAAAAGGCACAGCAAGCAGCAG CTGAC AC AGGAAACAACAGCCAGGTCA 

700 

GlaAsnTyrProIieValGlnAsnLeuGlnGlyGlnMetValHisGlnAlalleSerPro 
GCCAAAATTATCCTATAGTGCAGAACCTAC AGGGGCAAATGGTAC ATCAGGCC ATATCAC 

ArgThrLeuAsnAlaTrpValLysVallleGluGluLysAUPheSerProGluVaTU 
CTAGAACTTTGAACGC ATGGGTAAAAGTAATAG AAGAAAAGGCTTTCAGCCCAGAAGTaA 

800 . • ; 

ProMetPheSerAlaLeuSerGluGlyAlaThrProC-lnAspLeuAsnThrMecLeuKsn 
TACCCATGTTTTCAGCATTATCAGAAGGAGCCACCCCACAAGATTTAAACACCttTGCTAA 

. . • • * 

ThrValGlyGlvKisGluAlaAlaMetGlnMetLeuLysGluThrlleAsnGluGluAla 

ACACAGTGGGGGGACATCAAGCAGCCATGCAAATGCTAAAAG AGACCATCAATGAAGAAG 

# • • 

AlaGluTrpAspArgLe^isProValHisAlaGlyProIleAla^ 
CTGCAGAATGGGATAGGTTACATCCAGTGC ATGCAGGGCCTATTGCACCAGGCC AGATGA 

1000 

GluProArgGlySerAspUeAlaGlyThrThrSerThrLeuGlnGluGUIleAUTrp 
GAGAACCAAGGGGAAGTGATATAGCAGGAACTACTAGTACCCTTC AGGAACAAATAGCAT 

MetThrSerAsnProProIleProValGlyGluIleTyrLysArgTrpllelleValGly 
GGATGACAAGTAACCCACCTATCCCAGTAGGAGAAATCTATAAAAG ATGGATAA iTGTGG 

1100 ... • ; 

LeuAsaLysIleValArgMetTyrSerProValSerlleLeuAspIleArgGlnGlyPro 

GATTAAAIAAAATAGTAAGAATGTATAGCCCTGTCAG CATTTTGGACATAAGiiCAGGGnC 
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LysGluProPheArgAspTyrValAspArgPheTyrLysThrLeuArgAlaGluGlnAla. 
CAAAGGAACCTTTTAGAGACTATGTAGACCGGTTCTATAAAACTCTAAGAGCCGAGCAAG 

• « • • » t 

SerGlnAspVa lLy sAsnTrplIe tThrG luThrLeuLeuValGlnAsnAlaAsr.ProAsp 
CTTCACAGGATCTAAAAAATTGGATGACAGAAACCTTGTTGGTCCAAAATGCAAACCCAG 

1300 

CysLysThrlleLeuLysAlaLeuGlyProG InAlaThrLauGluG lulle tMe tThr Ala 
ATTGCAAGACTATCTTAAAAGCATTGGGACCACAGGCTACACTAGAAGAAATGATGACAG 

• • • • * • 
CysGlnGlyValGlyGlyProSerUisLysAlaArsViilLeuAlaGluAlaKetSerGln 

CATGTCAGGGAG7GGGGGGGCCC AGCCATAAAGCAAGAGTTCTGGCTGAGGCAATGAGCC 
1400 . • . 

AlaThrAsnSerValThrThrAlalle tile tG lnAr^G lyAsnPheLy sG lyProArgLy s 
AAGCAACAAATTCAGTTACTACAGCAATGATGCAGAGAGGCAATTTTAAGGGCCCAAGAA 

. . . . 1500 

I lelleLysCy sPheAsnCy sG lyLysG luG lyEisIl eAla.Ly s As nCy sAr gAlaPr o 
A AA TT AT T AAG TGTTTCAATTGTGGC A A AG A AG G G C A C AT AG C AA AAA A T TGCAGGGCCC 

• « • . * • • , 
ArgLysLysGlyCysTrpArgCysGlyLysGluGlyflisGlnLeuLysAspCysThrGlu 

CTAGGAAAAAGGGCTGTTGGAGATGTGGAAAGGAAGGACACCAACTAAAAGATTGCACTG 
r— •> POL • ♦ 1600 

JPhePheArgGluAsnLeuAlaPheProGlnGlyLysAlaG lyGluLeu 
ArgGlnAlaAs nPheLeuG lyArgl leTrpProSerHisLy sG lyAr gPr oG lyAsnPhe 
AGAGACAGGCTAATTTTTTAGGGAGAATTTGGCCTTCC CACAAGGGAAGGCCGGGGAACT 

• a ■ • • • 

SerProLysG InThrArgAlaAs n S e r ProTh r S e rArgG luLeuAr gVa ITrpGlyArg 
LeuGlnSerArgProG luPr oThrAlaPr oProAl aGluSerPheG lyPheG lyG luG Lu 
TTCTCCAAAGCAGACCAGAGCCAACAGCC CCACCAGC AGAGAGCTTCGGGTTTGGGGAAG 
1700 . .. 

AspAsnProLeuSerLysThrGlyAlaGluArgGlnG ly ThrVa 1 S e rP heAstiP h ePr o 
IleThrProSerGlnLysGlnGluGlnLysAspLysGluLeuTyrProLeuThrSerLeu 
AGATAACC CCCTCTCAAAAACAGGAGCAGAAAGACAAGGAACTGTATCCTTTAACTTCC C 

. GAS . . . 1800 

GlnlleThrLeuTrpG InArgPr oLeuValAlaileLys IleGLyGlyG InLeuLysGlu 
LysSerLeuPheGlyAsnAspProLeuSerGlnl 
TCAAATCACTCTTTGGCAACGACCC CTTGTCGCAAfTAAAAATAGGGGGACAGCTAAAGGA 

• • • • • • 
AlaLeuLeuAspThrG lyAlaAspAspThrVa lLeuG luG lulie t As nLeuProG ly Ly s 

AGCTCTATTAGATAC AGGAGCAGATGATACAGTATTAGAAGAAATGAATTTGCCAGGAAA 

1900 

TrpLysProLystletlleGlyGlylleGlyGlyPiteTleLysValArgG InTyrAspGIn 
ATGGAAACCAAAAATGATAGGGGGAATTGGAGGTTTTATCAAAGTAAGACAGTATGATCA 

• « • . • • • 
IleProIleGluIleCysGlyGlnLysAlalleGlyThrValLeuValGlyProThrPro 

AATACC CATAGAAATCTGTGGACAGAAAGCTATAGGTACAGTATTAGTAGGACCTACGCC 
2000 • 

ValAsnllelleG lyAr gAsnLeuLeuT hrG Inl 1 eG ly Cy sThrLeuAsnPbePr o I 1 e 
TGTCAACATAATCGGAAGAAATTTGTTGACCCAG ATTGGCTGCACTTTAAATTTTCCAAT 

2100 

S erP ro I 1 eG luThr Va 1? roVa lLy sLeuLy s?r oG lyMe t As pG lyProLy s Va lLy s 
TAGTCCTATTGAAACTGTACCAGTAAAATTAAAGCCAGGAATGGATGGCCCAAAAGTTAA 



G InTrpProLeuThrGluGluLy sIleLy sAlaLeuThrG lul leCy sThrAspMetG lu 
ACAATGGCCATTGACAG AAGAAAAAATAAAAGCATTAACAGAAATTTGTACAGATATGGA 

2.200 
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/-i /-i i ti c»,a..ti -ni v Pi-oGluAsnProTyrAsnThrPr.oIlePheAla 
LysGluGlyLysIleSerArglleG Lyr. ou tufts 7 _ 

AAAGGAAGGAAAAATTTCAAGAATTGGGCCTGAAAATCCATACAATACTCCAATATTTCC 
t c i „ e c 0 rV-iThrValLeuAsDValGivAspAlaTyrPheSerValProLeuAspGlu 

s fflS«;iI^;^!w«~««»«TTie« ! iicecni«*i S A 

AsoPheAr-LvsTvrThrAlaPhelhrlleSerSerlleAsaAsnGluThrProGlylU 

wiSiJiSciilSScMJcTTTACCATATCTACTlTAAACAATCAOACACCAGGGAT 

, 2500 • • 

Ar«TvrGlnTyrAsnValLuProGlnGlyTrpLysGlySerProAlaIlePheGlnSer 

tagIiItcagtacaatgtgcttccacagggatggaaaggatcaccggcaatattccaaag 

SerHetT^LysIleLeuGluProPheA^LysGlnAs^ 

tagcatgacaaaaatcttagagccctttagaaaacaaaatccagaaatggttatctatca 

Tvrl'etAspAspLeuTyrValGlySerAspLeuGluIleGlyGlnHisArgThrLysIle 
AScAlGG^J^T^GTilGTAGGlTCTGACTTAGAAATAGGGCAGCATAGGACAAAAAT 

GluLysLeuArsGluKisLeuLeuArgT^pGIyPheThrArgFroAsplysL^ 
AGAGAAATTAAGAGAACATCTATTGAGGTGGGGATTTACCAGACCAGATAAAAAACATCA 

.,..♦* 

LvsGluProProPheLeuTrpMetGlyTyrGluLeuHisProAspLysTrpThrValGln 

GAAAGAACCCCCATTTCTTTGGATGGGTTATGAACTCCATCCTGATAAATGGACAGTACA 

2800 

SerlleLysLeuProGluLysGluSerTrpThrValAsnAspIleGlnAsnLeuValGlu 
GTCTATAAAACTGCCAGAAAAGGAGAGCTGGACTGTCAATG ATATACAGAACTTAGTGGA 

. • • * 

a^t o^Ac^Tt-nAlaSerGlnlleTyrProGlylleLysValArgGlnLeuCysLysLeu 

GAGATTAAACTGGGCAAGCC AGATTTA^ 

LeuArgGlyThrLysflaLuThrGluvIlIleProLeuThrGluGluAlaGlu^ 
CCTTAGGGGAACCAAAGCACTAACAGAAGTAATACCACTAACAGAAGAAGCAGAATTAG» 

LeuAlaGl^AsnArgGluheLeuLysGluProValH^ 
ACTGGCAGAAAACAGGGAAATTTTAAAAGAACC^TACATGGAGTGTATTATGx.CCCAIC 

Lvs^spLeuIleAlaGluileGlnLysGUGlySisGlyGlnTrpThrTyrGlnlleTyr 

GlnGluPr^heLysAsn^uLysThrGiyLvsTyrAlaArgMetArgGlyAlaEisThr 
TCAAGAACCATTTAAAAATCTGAAAACAGGAAAGTATGCAAGAATGAGGGGTGCCCxYCaC 

AsnAspVaiLysC-lnLeuAlaGluAlaValGlnArgll^Se^ 

taatga^gtaaIgcaattagcagaggcagtgcaaagaatatccacagaaagcatagtgat 

TrpGlyArgThrProLysPheA.rgLeuP^IleGlnLys^^ 

atggggIaggactcctaaatttagactacccatacaaaaggaaacatgggaaacatggtg 
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AlaGluTyrTrpGlnAlaThrTrpIleProGluTrpG luPhe ValAsnThrProPr oLeu 
GGCAGAGTATTGGCAAGCCACTTGGATTCCTGAGTGGGAATTTGTCAATACCCCTCCTTT 

• • • • • * 

ValLysLeuTrpTyrGlnLeuGluLysGluProIlelleGlyAlaGluThrPheTyrVal 
AGTAAAATTATGGTACCAGTTAGAGAAGGAACCCATAATAGGAGCAGAAACTTTCTATGT 

3400 

AspGlyAlaAlaAsnArgGluThrLy sLeuGlyLy sAlaG lyTyr ValThr As pAr gG ly 
AGATGGGGCAGCTAATAGAGAGACTAAATTAGGAAAAGCAGGATATGTTACTGAGAGAGG 

• « • ■ • • • 
A rgG Lillys Val Va IP r oLeuThrAspThrThr AsnG InLysThrG luLeuG InAl a lie 

AAGACAGAAAGTTGTCCCTTTGACTGACACGACAAATCAGAAGACTGAGTTACAAGCAAT 
3500 . 

AsnLeuAlaLeuGlnAspSerGlyLeuGluValAsnlleValThrAspSerGlnTyrAla 
TAA T C TAG C C TTG C A GG ATT C G GG AT TAG AAGT A AAC AT AG TAAC AG AT TC AC A AT ATGC 

36 00 

LeuGly IlelleGlnAlaGlnProAspLysSerGluSerGluLeuValAsnG lnllelle 
ATTAGGAATCATTCAAGCACAACCAGATAAGAGTGAATCAGAGTTAGTCAATCAAATAAT 
■ » « • • i 

GluGlaLeuIleLysLysG luLy sValTyrLeuAlaTrp ValProAlaEisLy sGlylle 
AGAGCAGTTAATAAAAAAGGAAAAGGTTTACCTGGCATGGGTACCAGCACACAAAGGAAT 

. . 3700 

GlyGlyAsnG luGlnValASpLysLeuValSerGlriGlylleArgLys ValLeuPheLeu 
TGGAGGAAATGAACAAGTAGATAAATTAGTC AGTCAAGGAATCAGGAAAGTACTATTTTT 
« • • • * • 

AspGly IleAspLysAlaGlnGluGluHisGluLy sTy rHis As nAsriTrpArgAl aMe t 
GGATGGAATAGATAAGGCTCAAGAAGAAC ATGAGAAATATCACAACAATTGGAGAGCAAT 
3S00 

AlaSerAspPheAsnLeuProProValValAlaLysGluIleValAlaSerCysAspLys 
GGCTAGTGATTTTAACCTACCACCCGTGGTAGCAAAAGAAATAGTAGCTAGCTGTGATAA 

... 3900 

CysGlnLeuLysGlyGluAlaMetHisG lyGlaValAspCy s SerProG lylleTrpGln 
ATGTCAGCTAAAAGGAGAAGCCATGCATGGACAAGTAGACTGTAGTCCAGGAATATGGCA 

• • • • • ■ . 
LeuAspCysThrHisLeuGluGlyLysVallleLeuValAlaValKisValAlaSerGly 

ATTz\GATTGTACACACTTAGAAGGAAAAGTTATCCTGGTAGCAGTTCATGTAGCC AGTGG 

4000 

Tyr IleGluAlaGluVallleProAlaG luThrG lyG InG luThrAlaTyrPheLeuLeu 
CTATATAGAAGCAGAAGTTATTCCAGCAGAAACAGGGC AGGAAACAG CATATTTTCTTTT 

• • • • • • 
Ly sLeuAlaGlyArgTrpProValLys Val ValJJisThrAspAsnG ly S e r As nPh eThr 

AAA AT TAG CAGGAAGATGGCCAGTAAAAGTAGTACATACAGACAATGGCAGCAATTTCAC 
4100 .... 
SerAlaAlaValLysAlaAlaCy sTrpTrpAl'aG ly I leLysG laG luPheG lyl lePro 
CAGTGCTGCAGTTAAGGCCGCCTGTTGGTGGGCAGGTATCAAACAGGAATTTGGAATTCC 

4200 

TyrAsnProGlnSerGlnG ly Va 1 Va 1G lu S e rMe t As nLy S G luL euLy s Ly s 1 1 e 1 1 e 
C T A C A A T C C C C A A AG T C A AG G A G T A G T A G A A T C T A T G A A T A A AG A A T TAA AG A A A A T T AT 

• •«... 
GlyGlnValArgAspGlnAiaG luH isLeuLy sThr Al a Va lGlnnetAlaValPhelle 

AGGACAGGTAAG AGATCAAGCTGAACATCTTAAGACAGCAGTACAAATGGCAG TATTCAT 

4300 

HisAsnPheLysArgArgArgGlylleGlyGlyTyrSerAlaGlyGluArgI lelleAsp 
CCACAATTTTAAAAGAAGAAGGGGGATTGGGGGATAC AGTGC AGGGGAAAGAATAATAGA 
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IlelleAlaThrAspIleGlnThrLysGluLeuGlnLysGlnllelleLysIleGlnAsn 
CATAATAGCAACAGACATACAAACTAAAGAATTACAAAAACAAATTATAAAAATTCAAAA- 
4400 

PheArgValTyrTyrArgAspSerArgAspProIleTrpLy sG lyPr oAl aLy sLeuLeu 
TTTTCGGGTTTATTACAGAGACAGCAGAGATCCAATTTGGAAAGGACCAGCAAAGCTCCT 

4500 

TrpLysGlyGluGlyAlaVal VallleGlnAspLy sSerAspIleLysValValProArg 
CTGGAAAGGTGAAGGGGCAGTAGTAATACAAGACAAGAGTGACATAAAGGTAGTACCAAG 

• • _r-^3 • • • • 

ArgLysValLysIlelleArgAsp'DyrGlyLysGlnMetAlaGlyAspAspCysValAla 

/lie tG luAsnArgTrpG LnValHe t I 1 e Va ITrpG In 
AAGAAAAGTAAAGATTATTAGGGATTATGGAAAACAGATGGCAGGTGATGATTGTGTGGC 
PQL^-v • 4600 



SerAr gGlnAs pG luAs p 
ValAspArglie C Argil 



eLysThrTrpLy s SerLeuValLy sEisHisKe tTyrValSer 



AAGTAGACAGGATGAGGATITAAAACATGGAAAAGTTTAGTAAAACACCATATGTATGTTT 

• . • • • • • 

Ly sLysAlaAsnArgTrpPheTyrArgHisEisTyrGluSerProEisProLy s IleSer 
CAAAGAAAGCTAAC AGATGGTTTTATAG ACATCAC TATG AAAGCCCCCACC CAAAAATAA 
4700 . » 

SerGluValHisIleProLeuGlyGluAlaArgLeuVallleLysTbrTyrTrpGlyLeu 
GTTCAGAAGTACACATCCCACTAGGAGAAGCTAGACTGGTAATAAAAACATATTGGGGTG 

4800 

EisThrGlyGluArgGluTrpKisLeuGIyGlnGlyValSerlleGluTrpArgLysArg 
TGCATACAGGAG AAAGAGAATGGCATGTGGGTCAGGGAGTCTCCATAGAATGGAGGAAAA 

• ••••• 
ArgTyrSerThrGlnValAspProGlyLeuAlaA.spGlnLeuIleEis!Iet:TyrTyrPbe 

GGAGATATAGCACACAAGTAGACCCTGGCCTGGCAGACCAACTAATTCATATGTATTATT 

4900 • 

AspCysPheSerGluSerAlalleArgLysAlalleLeuGlyAspIleValSerProArg 
TTGATTGTTTTTCAGAATCTGCTATAAGAAAAGCCATATTAGGAGATATAGTTAGTCCTA 

• • • • • • 
CysGluTyrGlnAlaGlyKisAsnLysValGlySerLeuGlnTyrLeuAlaLeuThrAla 

GGTGTGAGTATC AAGC AGGACATAAC AAGG TAGGAT CCCTACAGTATTTGGCACTAACAG 
5000 • . . 

LeuIleAlaProLysG In I leLysPr oProLeuProS erVa lArgLy sLeuThrG luAs p 
CATTAATAGCACCAAAACAG ATAAAG CCAC C7TTGCCTAGTG TTAGGAAGCTAACAGAAG 
r-^. R . . • ' ■ • • 5100 

Vie tGluGlnAlaProAlaAspGlnGlyProG InArgG luPr oTyr AsnG luTrpAl a 
ArhTrpAsnLysProGlnGlnThrArgG 1 y H i s^r gG ly S e rEis ThrMe t As nG 1 yEi s 
ATAG^TGGAACAAGCC C CAGCAGACC AGGGGCCACAGAGGGAGCCATACAATGAATGGGC 

Q » • • • ■ • • 

LsuGluLeuLeuGluG luLeuLy s S e rG I uAl a Va 1 Ar gE i s? heP r o Ar g I 1 eT rpLeu 
ATfTAGAGCTTTTAG AGGAGCTTAAGAGTGAAGCTGTTAG ACATTTTCCTAGGATATGGCT 

5200 

EisSerLeuG i yG InK isj 1 eTy rG luThr Ty rG lyAs pThrTrp Va 1G 1 y Va 1G luAia 
CCATAGCTTAGGACAACATATTTATGAAACTTATGGGGATACCTGGGTAGGAGTTGAAGC 

• • • • • ■ • 

I lelleArglleLeuGlnGlnLeuLeuPhelleEisPheArglleGlyCysGlnEisSer 
TATAATAAGAATACTGCAACAATTACTGTTTATTC ATTTCAGAATTGGGTGTCAACATAG 
5300 . p5>S . R ~£-i 

Arg IleG ly I 1 e I 1 e Ar gG InArg Ar gAl aAr g* snGlySerSerArgSer 

MetAspProValAspProAsnLeuGlu 
ATGGATCCAGTAGATCC 1 



CAGAATAGGCATTATTCGACAGAGAAGAGCAAGA/ 



IAACCTAG 
5400 
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ProTrpAsnHisProGlySerGluProArgThrProCysAsnLy sCy sHisCy sLy sLys 
AGCCCTGGAACCATCCAGGAAGTCAGCCTAGGACTCCTTGTAACAAGTGTCATTCTAAAA 

Cy sCy sTy rHisCy sPro ValCy sPheLeuAsnLy sG lyLeuGlylleSerTyrGlyArg 
AGTGTTGCTATCATTGCCCAGTTTGCTTCTTAAACAAAGGCTTAGGCATCTCCTATGGCA 

5500 

LysLysArgArgG InArgArgGlyProProG InG LyGlyG InAl aHisG InValPro II e 
GGAAGAAGCGGAGACAGCGACGAGGACCTCCTCAAGGCGGT-CAGGCTCATCAAGTTCCTA 

ProLy sGlnl 

TACCAAAGCAGJlAAGTAGTACATGTAATGCAAC-CTTTAGGGATAATAGCAATAGCAGCAT 
5600 .... 
TAGTAGTAGCAATAATACTAGCAATAGTTGTGTGGACCATAGTATTCATAGAATATAGAA 
. - . . • 5700 

GGATAAAAAAGCAAAGGAGAATAGACTGTTTACTTGATAGAATAACAGAAAGAGC AG AAG 

rr^ENV * • • - ■ 

lMetArgAlaArgGlylleGluArgAsnCysGLnAsnTrpTrpLysTrpGly 
ACAGTGGCAATGAGAGCGAGGGGGATAGAGAGAAATTGTCAAAA'CTGGTGGAAATGGGGC 

5800 

IleMe tLeuLeuGlylleLeulie tThr Cy s S er A 1 aAl aAs pAs nLeuTr p Va ITh r Va 1 
ATCATGCTCCTTGGGATATTGATGACCTGTAGTGCTGCAG ACAATCTGTGGGTCACAGTT 
... • • • * 

TyrTyrGlyValProValTrpLysG luAl aT hrThrThr LeuPheCy sAlaSerAspAla 
TATTATGGGGTGCCTGTATGGAAGGAAGCAACCACCACTCTATTTTGTGCATCAGATGCT 

. 5900 . 
LysSerTyrGluThrGluAlaHisAsnll eTr pAl aThrK i s Al aCy s Va lPr oThr As p 
AAATCATATGAAACAGAGGCACATAATATCTGGGCCACAC ATGCC TGTGTACC CACGGAC 
. • . . . . 6000 

ProAsaProGlnG lu I 1 eAl aLeuG luAs nVa IThr G luAs nPheAsuMe t TrpLy s As n 
CCCAACCCACAAGAAATAGCACTGGAAAATGTGACAGAAAACTTTAACATGTGGAAAAAT 

• • • • • 

AsnMetValGluGlnMetHisGluAspIlelleSerLeuTrpAspGlnSerLeuLysPro 
AACATGGTGGAACAGATGCATGAGGATATAATCAGTTTATGGGATCAAAGCCTAAAACCA 

6100 

Cy sValLvsLeuThrProLeuCy s Va lThrLeuAsnCy s Ser As pG luLeuAr gAsnAsn 
TGTGTAAAATTAACCCCACTCTGTGTCACTTTAAACTGTAGTGATGAATTGAGGAACAAT 
« • • • * • 

GlyThrUetGlyAsnAsnValThrThrGluGluLysG lylletLy.sAsnCy sSerPheAsn 
GGCACTATGGGGAAC AATGTCACTACAGAGGAGAAAGGAATGAAAAACTGCTCTTTCAAT 

6200 ... • ■ 

ValThrThrVa lLeuLy sAspLy sLy sG InG lnV41Ty r Ai aLe uPh eTy r Ar gLeuAs p 
GTAACCACAGTACTAAAAGATAAGAAGCAGCAAGTATATGCACTTTTTTATAGACTTGAT 

. . - 6300 

IleValProIleAsoAsnAspSer SerThrAsnSerThrAsnTyrArgLeuIleAsnCys 
A T AG T A C C A A T AG A C A AT G AT AG T AG T A C C A A T AG T A C C A A T T AT AG G T T A AT A A AT TG T 

• • * • • • 
AsnThrSerAl a I leThrG InAiaCy sProLy sVal SerPheG luProIleProIleKis 
AA TACCTC AGCCATTACACAGGCTTGTCCAAAGGTATCCTTTGAGCCAATTCC CAT AC AT 

6400 

TyrCysAlaProAlaGlyPbeAlalleLeuLysCysArgAspLysLysPheAsnGlyThr 
TATTGTGCCCC AGCTG.GTT7TGCGATTCTAAAGTGTAG AGATAAGAAGTTCAATGGAACA 

• ' • • « • ■ 
GlyProCy s Thr As n Va 1 S er Thr Va 1G lu Cy sThrii i sG 1 y 1 1 eAr g? ro Va 1 Va 1 S e r 
GGCCCATGCACAAATGTCAGCACAGTACAATGTACACATGGAATTAGGCCAGTGGTGTCA 

6 500 
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ThrGlnLeuLeuLeuAsnGly Ser LeuAlaG luG luG luValllelleArgSerGluAsn 
ACTCAACTGCTGTTGAATGGCAGTCTAGCAGAAGAAGAGGTCATAATTAGATCCGAAAAT 

6600 

LeuThrAsnAsnAlaLysAsnllelleAlaHisLeuAsnGluSerValLysIleThrCys 
CTCACAAACAATGCTAAAAACATAATAGC ACATCTTAATG AATCTGTAAAAATTACCTGT 

• t « • ' • * 

AlaArgProTyrGlnAsnThrArgGlnArgThrProIleGlyLeuGlyGliiSerLeuTyr 
GCAAGGCCCTATCAAAATACAAGACAAAGAACACCTATAGGACTAGGGCAATCACTCTAT 

6700 

ThrThrArgSerArgSerllelleG lyGlnAlaKisCy sAsnlleSerArgAlaGlnTrp 
ACTACAAGATCAAGATCAATAATAGGACAAGCACATTGTAATATTAGTAGAGCACAATGG 

• ••••• 

SerLysThrLeuGlnGlnValAlaArgLysLeuGlyThrLeuLeuAsnLysThrllelle 
AGTAAAACTTTACAAC AAGTAGCTAGAAAATTAGGAACCCTTCTTAACAAAACAATAATA 

6000 

LysPheLysProSerSerGlyGlyAspProGluIleThrThrHisSerPheAsnCysGly 
AAGTTTAAACCATCCTCAGGAGGGGACCCAGAAATTACAACACACAGTTTTAATTGTGGA 

6900 

GlyGluPhePheTyrCysAsnThrSerGlyLeuPheAsnSerThrTrpAsnlleSerAla 
GGGGAATTCTTCTACTGTAATACATCAGGACTGTTTAATAGTACATGGAATATTAGTGCA 

• • • • • • 
TrpAsnAsnll eThrGluSerAsnAsnSerThrAsnThrAs.nl leThrLeuGlnCy sArg 
TGGAATAATATTACAGAGTCAAATAATAGCACAAACACAAACATCACACTCCAATGCAGA 

7000 

IleLysGlnllelleLysLletValAlaGlyArgLy s Al a I 1 eTy r AlaPr oPr o I 1 eG lu 
ATAAAACAAATTATAAAGATGGTGGCAGGCAGGAAAGCAATATATGCCCCTCCTATCGAA 

• • • • • • 

ArgAsnlleLeuCysSer SerAsnlleThrGlyLeuLeuLeuThrArgAspGlyGly lie 
AGAAACATTCTATGTTCATCAAATATTACAGGGCTACTATTGACAAGAGATGGTGGTATA 

7100 . . • • 

AsnAsnSerThrAsnGluThrPheArgProGlyGlyGlyAspUetArgAspAsnTrpArg 
AATAATAGTACTAACGAGACCTTTAGACCTGGAGGAGGAGATATGAGGGACAATTGGAG A 

. • 7200 

SerGluLeuTyrLysTyrLys ValVal Gin II eGluPro LeuGlyValAl aProThrArg 
AGTGAATTATATAAATATAAGGTAGTACAAATTG AACCACTAGGAGTAGCACCCAC CAGG 

• . • • • • 
AlaLysArgArgValValGluArgGluLysArgAlalleGlyLeuGlyAlalietPheLeu 
GCAAAGAG AAGAGTGGTGGAAAGAGAAAAAAGAGCAATAGGATTAGGAGCTATGTTGCTT 

7300 

GlyPheLeuGlyAlaAlaGlySerThrlletG lyiU aArg Ser ValTbrLeuThrValGln 
GGGTTCTTG G G AG C AG C AG G AAG C A C G AT G G G C G C AC G G T C AG TG AC G C TG AC G G T A C AG 

• ••••• 
AlaArgG lnLeuile? S e rG ly I le Va 1 G InG InG InAs nAs nLeuL euAr gAla I 1 eG 1 u 
G C C A G xi C A A T T A A T G T C T G G T A T AG T G C A A C A G C A A A A C A A T T T G C T G AG G G C T A T AG AG 

7 400 

AlaG InGlnHisLeuLeuG InLeuThr Va IT r ?G 1 yl 1 eLy sG InLeuG InAl aAr g 1 1 e 
GCGCAACAGCATCTGTTGCAACTCACGGTCTGGGGCATTAAACAGCTCCAGGCAAGAATC 

7500 

LeuAlaValGluArgTyrLeuLysAspGlnGlnLeuLeuGlyllsTrpGlyCysSerGiy 
CTGGCTGTGGAAAGATACCTAAAGGATCAACAGCTCCTAGGAATTTGGGGTTGCTCTGGA 
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LysEis IleCysXhr XhrAsnVa 1? roTrpAs nSer SerXrp S e rAsnArgSerLeuAs n 
AAACACATTTGCACCACTAATGTGCCCTGGAACTCTAGTTGGAGTAATAGATCTCTAAAT 

7600 ■ • 

GluIleTrpGlnAsnMetThrTrplIetGluTrpGluArgGluIleAspAsnTyrThrGly 
GAGATTTGGCAGAACATGACCTGGATGGAGTGGGAAAGAGAAATTGACAATTACACAGGC 

• • • * • • 
LeuIleTyr SerLeuIleGluGluSerGlnThrGlnG InG LuLy sAs nG luLy sG luLeu 
TTAATATATAGCTTAATTGAGGAATCGCAGACCCAGCAA'GAAAAGAATGAAAAAGAATTG 

7700 .... 
LeuGluLeuAspLysTrpAlaSerLeuTrpAsaTrpPheSerlleThrGlnTrpLeuTrp 
TTGGAATTGGACAAGTGGGCAAGTTTGTGGAATTGGTTTAGCATAACACAATGGCTGTGG 

. . . 7800 

TyrlleLysIlePhellelletllelleGlyGlyLeuIleGlyLeuArglleValPheAla 
TATATAAAAATATTCATAATGATAATAGGAGGCTTGATAGGTTTAAGAATAGTTTTTGCT 

• . * . . . 
ValLeuSerLeuValAsnArgValArgG InGlyTyrSerProLeuSerPheGlnThrLeu 
GTGCTTTCTTTAGTAAATAGAGTTAGGCAGGGATACTCACCTCTGTCGTTTCAGACCCTC 

7900 

LeuProAlaProArgGlyProAspArgProGluG lyThrGluGluGluGlyGlyGluArg 
CTCCCAG CCCCGAGGGGACC CGACAGGCCCGAAGGAACAGAAGAAGAAGGXGGAGAGCGA 

• • • . • . 
GlyArgAspArgSerValArgLeuLeuAsnGlyPheSerAlaLeuIleXrpAspAspLeu 
GGCAGAGACAGAXCCGXGAGAXXGCXGAACGGAXTCXCGGCACXXATCXGGGACGACCXG 

8000 . 

ArgSerLeuCy sLeuPheSerTy rH i s ArgLeuAr gAs pLeu II eLeuIl eAlaVa lArg 
CGGAGCCTGTGCCTCTTCAGCTACCACCGCTTGAGAGACTTAATCTTAATTGCAGTGAGG 

8100 

IleValG luLeuLeuG lyArgArgGlyTrpAspIleLeuLy s Ty r LeuTrpAs nLeuLeu 
ATTGTAGAACTTCTGGGACGCAGGGGGTGGGACATCCTCAAATATCTGTGGAATCTCCTA 

• * . ♦ . . 
G InTyrTrpSerGInG luLeuArgAsnSer AlaSer SerLeuPheAspAlalleAlalle 
CAGTATTGGAGTCAGGAACTGAGGAACAGTGCTAGTAGCTTGTTTGATGCCATAGCAATA 

820 0 . . 

AlaValAlaGluG lyThr As pArg Val IleG lul 1 e II eG LnArgAl aCy sArgAl aVa 1 
GCAGTAGCTGAGGGGACAGAT AGAGTTATAGAAATAATACAAAGAGCTTGCAGAGCTGTT 
« . • • • W ^ — ■ * • • 

LeuAsnlleProArgArglleArgGlnGlyLeuG luArgSerLeuLeu T~*^ 

jlietG lyGly 

C X X A A C A X A C C C A G A A G A A X A A G A C A G G G C X X A G A A A G G T X X A C X rX A A A A X G G G T G G - 
8300 ... 
LysTrpSerLysSerSerIleVc:lGlyTrp?ro/!laIleArsGluArgIleArgArgThr 
CAAAXGGXCAAAAAGXAGXAXAGTGGGAiGGCCXGCTAXAAGGGAAAGAATAAGAAGAAC 

8400 

AsaPr oAl aAl aAspG ly Vz 1G lyAi aVa 1 SerArgAs pLeuC- luLy sEisG lyAl a 1 1 e 
T A A T C C A G C AG C AG A T G G G G X A G G A G C AG X A X C T C G AG A C C TG G A A A A A C A X G G G G C A A X 

• . . • • • 
ThrSerSerA'snThrAlaSerThrAsnAlaAspCysAlaTrpLeuGluAlaGliiGluGlu 

C A C AAG T AG C AA T AC AG C A AG T A C X A AXG CTGACTGTGCCTGGCTAG A AG C AC A AG A AG A 

35 0 0 

SerAspGluValGlyPheProValArgProG InVa IP roLeuAr g?r olle t ThrTy rLy s 
GAGCGACGAGGXGGGCXXXCC AGXC AGACC CCAGGXACCXXXAAGACC AAXGACXX ACAA 

r£ U 3 

GluAlaLeuAspLeuSerHisPheLeuLysGluLysGlyGlyLeuGluGlyLeuIleTrp 
AG A AG C T C T A G A T C T C A G C C A C T T T XT A A A AG AA AAG GGGGG^CTG G AAG G G C T A A T T TG 
8600 .... 
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SerLysLysArgGlnGluIleLeuAspLeuTrpValTyrAsnThrGlnGlyllePtfeFro 
GTCCAAAAAGAGACAAGAGATCCTTGATCTTTGGGTCTACAACACACAAGGCATCTTCCC 

■ • . . 8700 
AspTrpGlnAsnTyrThrProGlyProGlylleArgTyrProLeuThrPheGlyTrpCys 
TGATTGGCAAAACTACACACCAGGGCCAGGGATCAGATATCCACTAACCTTTGGATGGTG 

. 

TyrGluLeuValProValAspProGlnGluValGluGluAspThrGluGlyGluThrAsn 
CTACGAGCTAGTACCAGTTGAT CCACAGGAGGTAGAAGAAGACACTGAAGGAGAG ACCAA 

8800 

SerLeuLeuKisProIleCysGlnEisGlyMetGluAspProGluArgGlnValLeuLys 
CAGCTTGTTACACCCTATATGCCAGCATGGAATGGAGGACCCGGAGAGACAAGTGTTAAA 
• 

TrpArgPheAsnSerArgLeuAlaPheGluEisLysAlaArgGlulIetHisProGluPhe 
ATGGAGATTTAACAGCAGACTAGCATTTGAG CACAAGGCC CGAGAGATGCATCCGGAGTT 
- 8900 . ... 

TyrLysAsn 

CTACAAAAACTGATGACAC.CGAGCTTTCTACAAGGGACTT7CCGCTGGGGACTTTCCAGG 

9000 

GAGGCGXGGACTGGGCGGGACTGGGGAG TGGCTAACCCTCAGATGCTG CATATAAGCAG C 

TGCTTTTTCCCTGTACTCpGTCTCTCTGGTTAGACCAGATTTGAGCCTGGGAGCTCTCTC 

9100 . 

GCTAGCTAGGGAACCCACTG CTTAAGCCTCAATAAAG CTTGCCTTGAGTG CTTCAA | 
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LA V.MAI 



[ggtctctcttgttagaccaggtcgagcccgggagctctctggctagcaaggaacccactg 
cttaagcctcaataaagcttgccttgagtgcctca^Jgcagtgtgtgcccatctgttgtgt 

100 . . U5 *-r* 

GACTCTGGTAACTAGAGATCCCTCAGACCACTCTAGACGGTGTAAAAATCTtTAGCAGip 

• • • ... 
GCGCCCGAACAGGGACTTTAAAGTGAAAGTAACAGGGACTCGAAAGCGGAAGTTCCAGAG 

200 

AAGTTCTCTCGACGCAGGACTCGGCTTGCTGAGGTGCACACAGCAAGAGGCGAGAGCGGC 

SletGIyAlaArg 

GACTGGTGAGTACGCCAATTTTTGACTAGCGGAGGCTAGAAGGAGAGAG|ITGGGTGCGAG 

. • • • . . • * 

AlaSerValLeuSerGlyGlyLysLeuAspAlaTrpGluLysIleArgLeuArgProGly 

AGCGTCAGTATTAAGCGGGGGAAAATTAGATGCATGGGAGAAAATTCGGTTAAGGCCAGG 

... 400 . - 

GlyLysLysLysTyrArgLeuLysKisLeuValTrpAlaSerArgGluLeuGluArgPhe 
GGGAAAGAAAAAATATAGACTGAAACATTTAGTATGGGCAAG-CAGGGAGCTGGAAAGATT 

• . • • • • . 
AlaLeuAsnProGlyLeuLeuGluThrGlyGluGlyCy sGlnGlnlleMe tGluGlnLeu 

CGCACTTAACCCTGGCCTTTTAGAAACAGGAGAAGGATGTCAACAAATAATGGAACAGCT 

500 

GlnSerThrLeuLysThrGlySerGluGluIleLysSerLeuTyrAsnThrValAlaThr 
ACAATCAACTCTCAAGACAGGATCAGAAGAAATTAAATCATTATATAATACAGTAGCAAC 

600 

LeuTyrCysValHisGlnArglleAspValLysAspThrLysGluAlaLeuAspLysIle 
CCTCTATTGTGTACATCAAAGGATAGATGTAAAAGACACCAAGGAAGCGCTAGATAAAAT 

• ••••• 
GluGluIleGlnAsnLysSerArgGlnLysThrGlnGlnAlaAlaAlaAlaGliiGlnAla 

AGAGGAAATACAAAATAAGAGCAGGCAAAAGACACAGC AGG CAGCAGCTGCACAGCAGGC 

700 

AlaAlaAlaThrLysAsnSerSerSerValSerGlnAsnTyrProIleValGlnAsnAla 
AGCAGCTGCCACAAAAAACAGCAGCAGTGTCAGTCAAAATTACCCCATAGTGCAAAATGC 

• « • ... 
GlnGly.GlnKet IleHisGlnAlalleSerProArgThrLeuAsnAlaTrpValLysVal 

ACAAGGGCAAATGATACATCAGGCCATATCACCTAGGACTTTGAATGC ATGGGTGAAAGT 

800 ... 
IleGluGluLysAlaPheSerProGluVallle^roMe tPheSerAlaLeuSerGluGly 
AATAGAAGAAAAGGCTTTCAGCCCA.GAAGTG ATACCCATG TTCTCAGCATTATCAGAGGG 

900 

• AlaThrProGlnAspLeuAsnMecMetLeuAsnlleValGlyGlyfiisGlnAlaAlaKet 
GGCCACC CCACAAGATTTAAATATGATGCTG AACATAGTTGGAGGACAC CAGG CAGCTAT 

• • • • • 
GlnMetLeuLysAspThrll eAsnGluG luAlaAl aAspTr pAs pArgValHisProVal 

GCAAATGTTAAAAGAT ACCATCAATGAGGAAG CTGCAG ACTGGGAC AG GGTACATCCAGT 

1000 

HisAlaGlyProIleProProGlyGlnMetArgGluProArgGlySerAspIleAlaGly 
ACATGCAGGGCCTATTCC CCCAGGCCAGATG AGAG AACCAAGAGGAAGTGACATAGCAGG 
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Tn r Th r 5 g rrtmie'u G inGluGlnlleGlyTrpMetThrSerAsnProProIleProVal 
AACTACTAGTACCCTTCAAG AACAAATAGGATGGATGACAAGCAACC CACCTATCCCAGT 
1100 . • > 

GlyAspIleTyrLysArgTrpIlelleLeuGlyLeuAsnLysIleValArgMetTyrSer 
GGGAGACATCTATAAAAGATGGATAATCCTGGGATTAAATAAAATAGTAAGAATGTATAG 

1200 

ProValSerlleLeuAspIleArgGlnGlyProLysGluProPheArgAspTyrValAsp 
CCCTGTCAGCATTTTGGACATAAGACAAGGGCC AAAGGAACCTTTTAGAGACTATGTAGA 

• • • • • • 

ArgPbePheLysThrLeuArgAlaGluGlnAlaThrGlnGluValLysAsnTrpMetThr 
TAGGTTCTTTAAAACTCTCAGAGCTGAGCAAGCTACACAGGAGGTAAAAAATTGGATGAC 

1300 

GluThrLeuLeuValGlnAsnAlaAsnProAspCysLysThrlleLeuLysAlaLeuGly 
AGAAACCTTGCTGGTCCAAAATGCGAATCCAGACTGTAAGACCATTTTAAAAGCATTAGG 

• ••••• 

ProGlyAlaThrLeuGluGluMetMecThrAlaCysGlnGlyValGlyGlyProSerHis 
ACCAGGGGCTAC ATTAGAAGAAATG ATGACAGCATGCCAGGGAGTGGGAGGACCCAGTCA 
1400 - . 

LysAlaArgValLeuAlaGluAlaMetSerGlnAlaThrAsnSerThrAlaAlallettet 
TAAAGCAAGAGTTTTGGCTGAGGCAATGAGCCAAGCAACAAATTCAACTGCTGCCATAAT 

1500 

MetGlnArgGlyAsnPheLy sGlyGlnLysArglleLysCy sPheAsnCysG lyLysGlu 
GATGCAGAGAGGTAATTTTAAGGGCCAGAAAAGAATTAAGTGTTTCAACTGTGGCAAAGA 

• . • • • • 

GlyHisLeuAlaArgAsnCysArgAlaProArgLysLysGlyCysTrpLysCysGlyLys 
AGGACACCTAGCCAGAAATTGCAGGGCC CCTAGGAAAAAGGGCTGTTGGAAATGTGGGAA 

1600 -^POL 

IPhePheArgGluAsnLeu 
GluGlyRisGlnMetLysAspCysThrGluArgGlnAlaAsfaPheLeuGlyLysIleTrp 
GGAAGGACACCAAATGAAACACTGCACTGAGAGACAGGCTAAjrTTTTTAGGGAAAATTTG 

• • • • ♦ * 

AlaPheProG InG lyLy s Al aAr gG luPhePr c SerG luG InThrAr gAlaAsnS e rPr o 

Pr oSer HisLy sG lyArgProG lyAsnPheLeuG 1 n S e r Ar gP r oG 1 uP r o Th r A 1 a Pr-O- 
GCCTTCCCAC AAGGGAAGGCCAGGGAATTTCCTTC AGAGCAGACCAGAGCCAACAGCCCC 
1700 . 

ThrSerArgG luLeuArgValTrpGlyGlyAspLysThrLeuSerGluThrGlyAlaG lu 

Pr oAl aG luSerPheG lyPheG ly G luG lu 1 1 eLy sPr oSerG InLy sG InG luG InLys 
ACCAG CAGAGAGCTTCGGGTTTGGGGAGGAGATAAAACCCTCTCAGAAACAGGAG CAGAA 

1800 

ArgGlnGlylleValSerPheSerPheProGlnpleThrLeuTrpGlnArgProVal Val 

AspLysGluLeuTyrProLeuAlaSerLeuLysSerLeuPheGlyAsnAspGlnLeuSer 
AGACAAGGAATTGTATCCTTTAGCTTC C CTCAAATCACTCTTTGGCAACGACCAGTTGTC 

GAG^— \ • • • • « •' 

Thr ValArgValGlvGlyGlnLeuLy sG luAl aLeuLeuAspThrG 1 yAl aAs ? As pThr 
Gin " 

ACAG r A AG AG T AG G A G G A C A G C T A A A AG A A G C T C T A T T AG A C A C AG G AG C A G A T G A T A C A 

r . . 1900 

ValLeuGluGluIleAsnLeuProGlyLy sTrpLy sProLysMetlleGlyGlylleGly 
GTAT TAG AAG AAAT AA ATTTG C C AG G AAA ATG G AA AC C AAAAATGATAG GGGG A ATTGG A 

• • • ■ • * 
GlyPhelleLysYalArgGliiTyrAspGlnlleLeuIleGluIleCysGlyLy sLysAla 
GGTTTTATCAAAGTAAG ACAG TATG AT C AAAT ACT TATAG AAA TTTGTGGAAAAAAGGCT 

2000 .... 



WO 87/07906 



12/18 



PCT/EP87/00326 



IleGlyThrlleLeuValGlyProThrProValAsnllelleGlyArgAsnMetLeuThr 
ATAGGTACAATATTGGTAGGACCTACACCTGTCAACATAATTGGACGAAATATGTTGACT 

2100 

GlnlleGlyCysThrLeuAsnPhePro.IleSerProIleGluThrValProValLysLeu 
CAG ATTGGTTGTACTTTAAATTTTC CAATTAGTCCTATTGAGACTGTACCAGTAAAATTA 

• • • • • • 
LysProGlylletAspGlyProArgValLysGlnTrpProLeuThrGluGluLysIleLys 
AAGCCAGGGATGGATGGCCCAAGGGTTAAACAATGGCCATTGACAGAAGAAAAAATAAAA 

2200 

AlaLeuThrGluIleCy sLysAspMetGluLysG luG lyLysIleLeuLy sIleGlyPro 
GCATTAACAGAAATTTGTAAAGATATGGAAAAGGAAGGAAAAATTTTAAAAATTGGGCCT 

• «•••• 
GluAsnProTyrAsnThrPro ValPheAl a 1 1 eLy sly sLy s AspSerThrLy sTrpArg 
GAAAATCCATACAATACTCCAGTATTTGCC ATAAAGAAAAAAGACAGCACTAAATGGAGA 

2300 . " « 

LysLeuValAsnPHeArgGluLeuAsnLysArgThrGlnAspPhe^rpGluValGlnLeu 
AAATTAGTGAATTTCAGAGAGCTTAATAAAAGAACTCAAGATTTTTGGGAAGTTCAATTA 

2400 

GlylleProHisProAlaG lyLeuLy sLy sLysLy s SerValThr ValLeuAsp ValGly 
GGAATACCACATCCTGCTGGGTTGAAAAAGAAAAAATCAGTCACAGTATTGGATGTGGGG 

• • • • • • 
AspAlaTyrPheSerValProLeuAspGluAspPheArgLysTyrThrAlaPheThr lie 
GATGCATATTTTTCAGTCC CTTTAGATGAAGAXTTCAGGAAGTATACTGCATTCACTATA 

2500 

ProSerlleAsnAsnG luThrProG ly 1 1 eArgTy rG InTy r AsnVa lLeuP r oG InG ly . 
CCCAGTATTAATAATG AGACACCAGGGATTAGATATCAGTACAATG.TGCTACCACAGGGA 

TrpLysGlySerProAlall ePheG InSer SerHe tThrLys IleL'euGluPr oPheArg . 
TGGAAAGGATCACCAGCAATATTCCAGAGTAGCATGACAAAAATCTTAGAACC CTTTAGA 

2600 . . 

ThrLysAsnProGluIleVallleTyrG InTy rile t Asp AspLeuTyrValGly SerAsp 
ACAAAAAATCCAGAAAT AGTCATATACCAATACATGGATGATTTGTATGTAGGGTCTGAT 

2700 

LeuG luIleGlyGlnHisArgThrLysIleGluG luLeuArgG luE isLeuLeuLy sTrp 
TTAGAAATAGGAC AACATAGAACAAAAATAGAGGAACTAAGAG AACATCTATTGAAATGG 

• • • • ♦ • 
GlyPheXhrThrProAspLysLysHisG InLy sG luProPr oPheLeuTrpMe tG ly Ty r 
GGATTTAC CACA.CCAGACAAAAAG CATC AGAAAGAACCCCCATTTCTTTGGATGGGGTAT 

. 2800 

GluLeuKisProAspLysTrpThrValGlnProfleGlnLeuProAspLy sGluSerTrp 
GAACTCCACCCTGACAAATGGACAGTGCAGCCTATACAACTGCCAGACAAGGAAAGCTGG 

• • • • • » 
ThrValAsnAspIleG InLy s Leu ValG ly Ly sLeu AsnTrp Al aSe rG 1 nil eTyr Pro 
ACTGTCAATGATATACAG AAATTGGTGGGAAAACTAAATTGGGCAAGTCAGATTTATCCA 

2900 . 

Gly I leLy 6 Va lLysG InLeuCy s Ly s Le uL eu Ar gC lyAlaLy sAlaLeuThr As p 
GGAATTAAAGTAAAG C AATT ATGTAAACTCCTTAGGGGAGCAAAAGCACTAACAGACATA 

3000 

ValProLeuThrAlaG luAlaGluLeuG luL euAl aG luAsnArgG lull eLeuLysG lu 
GTACCATTAACTGCAG AGGC AGAATTAG AATTGGCAGAGAACAGGGAAATTCTAAAAG AA 
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?roValH 1 sGlyValTyrTyrAspProSerLy 3 As?LeuIleAlaGluIleGlnLysGln 
CCAGTGCATGGGGTATATTATGACCCATCAAAAGACTTAATAGCAG AAATACAGAAGCAG 

3100 

GlyGlnGlyGlnTrpThrTyrGlnlleTyrGlaGluGlnTyrLysAsaLeuLysThrGly 

GGGCAAGGTCAATGGACATATCAAATATACCAAGAGCAATATAAAAATCTGAAAACAGGG 
• 

LysTyrAlaArglleLysSerAlaHisThrAsnAspValLysGlnLeuThrGluAlaVal 

AAGTATGCAAGAATAAAGTCTGCCCACACTAATGATGTAAAACAATTAACAGAAGCAGTG 
3200 . . . 

GlnLysIleAlaGlnGluSerlleVallleTrpGlyLysThrProLysPheArgLeuPro 
CAAAAGATAGCCCAAGAAAGCATAGTAATATG GGGAAAAACTCCTAAATTTAGACTACCC 

T , , ' " ' ' * 3300 

IleGlnLysGluThrTrpGluAlaTrpTrpThrGluTyrTrpGlnAlaThrTrpIlePro 
ATACAAAAAGAAACATGGGAGGCATGGTGGACAGAATATTGGCAAGCCACCTGGATCCCT 

• 

GluTrpGluPheValAsnThrProProLeuValLysLeuTrpTyrGlnLeuGluThrGlu 
GAATGGGAGTTTGTCAATACTCCTCC CCTAGTAAAACTATGGTACCAGTTAGAAACAGAA 

3400 

ProIleValGlyAlaGluThrPheTyrValAspGlyAlaAlaAsnArgGluThrLysLys 

CCCATAGTAGGAGCAGAAACTTTCTATGTAGATGGGGCAGCTAATAGAGAAACTAAAAAG 

• • . 

GlyLysAlaGlyTyrValThrAspArgGlyArgGlnLysValValSerLeuThrGluThr 

GG«\AAAGCAGGATATGTTACTGACAGAGGAAGACAAAAGGTTGTCTCCTTAACTGAAACA 
3500 ... 

ThrAsnGlnLysThrGluLeuGlnAlalleEisLeuAlaLeuGlaAspSerGlySerGlu 
ACAAATCAGAAGACTGAATTACAAGCAATCCACTTAGCTTTACAGGATTCAGGATCAGAA 

• • • 3600 
ValAsnlleValThrAspSerGlnTyrAlaLeuGlyllelleGlnAlaGlnProAspLys 
GTAAACATAGTAACAGACTCACAGTATGCATTAGGGATTATTCAAGCACAACCAGATAAA 

• 

SerGluSerGluIleValAsnGlnllelleGluGlnLeuIleGlnLysAspLysValTyr 
AGTGAATCAGAGATTGTTAATCAAATAATAGAGCAATTAATACAGAAGGACAAGGTCTAC 

• • 3700 . . 

LeuSerTrpValProAlaEisLysGlylleGlyGlyAsnGluGlnValAspLysLeuVal 
CTGTCATGGGTACCAGCACACAAAGGGATTGGAGGAAATGAACAAGTAGATAAATTAGTC 

SerSerGlylleArgLysValLeuPheLeuAspGlylleAspLysAlaGlnG-luGluKis 

AGCAGTGGAATCAGAAAGGTACTATTTTTAG ATGGGATAGATAAGGCTCAAGAAGAAC* T 
3800 . . ' . 

GluLysTyrHisSerAsnTrpArgAlaKetAlaSerAspPheAsnLeuProProIleVal 
GAAAAATATCACAGCAATTGGAGAGCAATGGCTAG'TGACTTTAATCTACCACCTATAGTA 

• . . 3900 

AlaLysGluIleValAlaSerCysAspLysCysGlnLeuLysGlyGluAlaUetEisGly 
GCGAAGGAAATAGTAGCCAGCTGTGATAAATGTCAACTAAAAGGGGAAGCCATGCATGGA 
• 

GlnValAspCysSerProGlylleTrpGlnLeuAspCysThrHisLeuGluGlvLysIle 
CAAGTAGACTGTAGTGCAGGGATATG GCAATTAGATTG CACACATCTAG AAGGAAAAATA 

4000 

IlelleValAlaValHisValAlaSerGlyTyrlleGluAlaGluVallleProAlaGlu 
ATCATAGTAGCAGTCCATGTAGCCAGTGGATATATAGAAGCAGAAGTTATCCCAGCAGAA 
• 

ThrGlyGlnGluThrAlaTyrPhelleLeuLysLeuAlaGlyArgTrpProValLysVal 

ACAGGACAGGAGACAGCATACTTTATACTAAAATTAGCAGGAAGATGGCCAGTAAAAGTA 
4100 
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ValBisThrAspAsnG lySerAsnPheThrSerAlaAlaValLysAlailaCysTrpTrp 
GTACACACAGAC AATGGCAGCAATTTCACCAGTGCTGCAGTTAAAGCAGCCTGTTGGTGG 

4200 

AlaAsnlleLy sGlnG luPheG ly I 1 eProTy r As nProG InSerGlnGly VaLValGlu 
GCAAATATCAAACAGGAATTTGGAATTCCCTACAACCCCCAAAGTCAAGGAGTAGTGGAA 
. • • • • ♦ 

SerMetAsnLy sGluLeuLysLysIlelleGlyGlnValArgGluGlnAlaGluEisLeu.. 

TCTATGAATAAGGAATTAAAGAAAATCATAGGG.CAGGTAAGAGAGCAAGCTGAACACCTT 

4300 

LysThrAlaValGlnlletAlaValPhelleEisAsnPheLysArgLysGlyGlylleGly 
AAGACAGCAGTACAAATGGCAGTGTTCATTCACAATTTTAAAAGAAAAGGGGGGATTGGG 

• • « • • • 
GlyTyr S e r Al aG ly G luArg I 1 e I 1 eAs pMe t II eAl aThr As p I 1 eG InThr Ly sG lu 
GGGTACAGTGCAGGGGAAAGAATAATAGACATGATAGCAACAGACATACAAACTAAAGAA- 

4400 « . - . 

LeuG lnLysG In II eThrLys II eG lnAs nPheArgVa ITy r Ty rAr gAs pAsnAr gAsp 
TTACAAAAACAAATTACAAAAATTCAAAATTTTCGGGTTTATTACAGGGACAACAGAGAC 

4500 

ProIleTrpLysGlyProAlaLysLeuLeuTrpLy sGlyGluGlyAlaValVal IleGln 
CCAATTTGGAAAGGACCAGCAAAACTACTCTGGAAAGGTGAAGGGGCAGTAGTAATACAG 

* • ■ • . • #->Q • 

AspAsnSerAspIleLys Val ValPr o ArgAr gLy sAlaLy sIlelleArgAspTyrGly 

MetGlu 

GACAATAGTGATATAAAGGTAGTACCAAGAAGAAAAGCAAAAATCATTAGGGATTATGGA 

♦ • • 4600 POL^t-i • • 
LysGlnMe t-AlaGlyAspAspCysValAlaGlyGlyGlnAspGluAsp 

As nArgTrpG InVa lMe 1 1 1 e Va 1 TrpG InVa lAs pArgKe tAr g 1 1 e ArgThrTrpHis 
AAACAGATGGCAGGTGATGATTGTGTGGCAGGTGGACAGGATGAGGATJTAGAACATGGCA 

• * • • • • 

S erLeuValLysHisEisKetTyrValSerLysLysAlaLysAsnTrpPtaeTyr ArgHis 
CAGTTTAGTAAAACATCATATGTATGTCTCAAAGAAAGCTAAAAATTGGTTTTATAGACA 
4700 . . ... 

HisTyrG lu S er ArgHisProLy sVa 1 Ser SerG luValHis I lePr oLeuG lyAspAla 
TCACTATGAAAGCAGGCATCCAAAAGTAAGTTCAGAAGTACACATCCCACTAGGGGATGC 

4S00 

ArgLeuVal Val ArgThrTyr TrpG ly LeuG InThr GlyGluLys As pTrpK is LeuG ly 
TAGATTAGTAGTAAGAACATATTGGGGTCTGCAAACAGGAGAAAAAGACTGGCACTTGGG. 

. . • ... 

HisG ly Val Ser II eG luTrpAr gG InLy sAreTy r SerTbrGlnLeuAspProAspLeu 
TCATGGGGTC TCCAT AGAATGGAGGCAGAAAAGATAT AGCACACAACTAGATCCTGACCT 

4900 

AlaAspG InLeuIl -eEisLeuTyrTyrPheAspCy s Phe S er G lu Se r Al all eArgGln 
AG CAGACCAACTGATTCATCTGTACTATTTTGATTGTTTTTCAGAATCTGCCATAAG ACA 
. . • . « • 

AlalleLeuG'lyHisIleValSerProAr gCy sAspTyrG InAl aGlyHis As nLys Val 
AG CCATATTAGGACATATAGTTAGTCCTAGGTGTGATTATCAAGCAGGACATAACAAGGT 
50C0 .... 
G ly Ser LeuG InTy rLeuAl aLeuThr Al aLeu I leAlaPrcLy sLy sThrArgProPro 
AGGATCTTTAC AGTATTTGGCACTAACAG C ATTAATAGCACCAAAAAAG ACAAGGCCACC 

r-»R * • 5100 

MetG luG InAlaProAlaAspGlnGly 

LeuProSerValArgLysLeuThrGluAspArgTrpAsnLysProGlnGlnThrLysGly 

TTTGCCTAGTGTTAGGAAGCTAAC AG AAGATAG ATGGAACAAGCC CCAGCAGACCAAGGG 
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ProGlnArgGluProItisAsnG luTrpThrLeuG luLeuLeuGluGluLeuLysGlnGlu 

HisArgG lySerHisThrMecAsnG lyHis 
CCACAGAGGGAGCCACACAATGAATGGACAI TAGAACTTTTAGAGGAGCTTAAGCAAGAA 

. 5200 

AlaValArgHisPheProArglleTrpLeuEisSerLeuGlyGlnHis IleTyrGluThr 
GCTGTCAGACACTTTCCTAGGATATGGCTCCATAGTTTAGGACAACATATCTATGAAACT 

..«♦•♦ 

TvrGlyAsoThrTrpGluGly ValG luAl al 1 e II eAr gSerLeuG InG luLeuLeuPhe 
TATGGGGATACCTGGGAAGGAGTTGAAGCTATAATAAGAAGTCTGCAACAACTGCTGTTT 
5300 . 

IleEisPheArglleGlyCysGlnHisSerArglleGlylleThrArgGliiArgArgAla 
ATTCATTTCAGAATTGGGTGTCAACATAGCAGAATAGGCATTACTCGACAGAGAAGAGCA 

r»s • 5400 

ArgAsnG ly SerSerArgSer 

KetAspProValAspProAsnLeuG luPro TrpAs n'disProCly SerGlnProArg 
AGAA A.TGGATCCAGTAG ATCC TAACTTAGAGCCCTGGAACCATCCAGGGAGTCAGCCTAG 

• • • • • 

ThrProCy sAsnLysCy sTyrCy sLysLysCy sCy sTyrEisCy sGlnKetCysPhelle 
GACGCCTTGTAATAAGTGTTATTGTAAAAAGTGCTGCTATCATTGCCAAATGTGCTTCAX-- 

5 500 

ThrLysGlyLeuGlyllsSerTyrGlyArgLysLysArgArgGlnArgArgArgProPro 
AACGAAAGG CTT AGGCATCTCCTATGGCAGGAAGAAGCGGAGAC AGCGACGAAGACCTCC 

* • • S ^ * * • 
GInG lyAsnGlnAlaHisGlnAspProLeuProGluGln 

TCAGGGCAATCAGGCTCATCAAGATCCTCTACCAGAGCAG [AAGTAGTATATGTAATACA 

5600 • • • 

AC CTT TAG TGATATTAGCAATAGTAGCATTAGTAGTAACGCTAATAATAGCAATAGTTGT 

. 5700 

GTGGACCATAGTATTTATAGAAATTAGGAAAATAAGAAGACAAAGGAAAATAGACAGGTT 

EN V 

Me tArgVa lArgG lull eG InAr g 
ATG AG AGTG AGGGAGATACAGA 
00 

AsnTyrGlnAsnTrpTrpArgTrpGlyMetHecLeuLeuGlyHetLeuKetThrCysSer 
GGAATTATCAAAACTGGTGGAG ATGGGGCATG ATGCTCCTTGGGATGTTGATGACCTGTA 



GATTGATAG AA T A AG A G A A AG AG C A G A AG AT AG T G G C 

5 



IleAlaGluAspLeuTrpValThrValTyr TyrGly Va 1 Pr o Va ITrpLy sG luAl a Thr 
GTATTGCAG AAGATTTGTGGGTTACAGTTTATTATQGGGTAC CTGTGTG GAAAGAAGCAA 
5900 ' 
Thr.ThrLeuPbeCy sAl aSer AspAl aLy s SerTyrG luThrGlu Va 1H LsAs nil eTrp 
CCA CTACTCTATTTTG TGCATCAG ATGCTAAATCATATGAAACAGAAGTACATAACATCT 

. . 6000 

AlaThrHisAl aCy s Va 1? r o Thr As ?? ro AsnProG InGlu II eGluLeuG luAsnVal 
GGGCTACACATGCCTGTGTACCCACGGACCCC AACCCACAAGAAATAGAACTGGAAAATG 

• • • • • • 

Th r G lu G 1 y ? h e A s nM e c T r pL y s A s n As aK e t V a 1 G 1 uG 1 nMe t E i s G 1 u A s p I i e 1 1 e 

TCACAG AAGGGTTTAACATGTGGAAAAATAAC ATGGTGG AGC AG ATGCATGAGG ATATAA 

6100 
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SerLeuTrpAspGlnSerLeuLysProCy s ValLy sLeuThrProLeuCys ValThr Leu 
TCAGTTTATGGGAT CAAAGCCTAAAACCATGTGTAAAG CTAACCCCACTCTGTGTC ACTT 

• • • t • • , 
AsnCysThrAsaValAsaG lyThrAl aVa lAsnG ly Thr AsnAlaGlySerAsnArgThr 

TAAACTGCACTAATGTGAATGGGACTGCTGTGAATGGGACTAATGCTGGGAGTAATAGGA 
6200 . 

AsnAlaGluLeuLysMetG lul 1 eG lyGlu Va lLy sAsnCy sSerPheAshlleThrPro 
CTAATGCAGAATTGAAAATGGAAATTGGAGAAGTGAAAAACTG CTCTTTCAATATAACCC 

6300 

ValGlySerAspLysArgGlnGluTyrAlaThrPheTyrAsnLeuAspLeuValGlnlle 
CAGTAGGAAGTGATAAAAGGCAAGAATATGCAACTTTTTATAACCTTGATCTAGTACAAA 

• • - • * • • 
AspAspSerAspAsnSerSerTyrArgLeuIleAsnCysAsnThrSerVal IleThrGln 

TAGATGATAGTGATAATAGTAGTTATAGGCTAATAAATTGTAATACCTCAGTAATTACAC 

6400 . 

AlaCysProLysValThrPheAspProIleProI.leHisTyrCysAlaProAlaGiyPhe 
AGGCTTGTCCAAAGGTAACCTTTGAT CCAATTCCCATACATTATTGTG CCCCAGCTGGTT 

■ • • • » ♦ 
AlalleLeuLysCysAsnAspLysLysPheAsnGlyThrGluIleCysLysAsnValSer 

TTGCAATTCTAAAGTGTAATGATAAGAAGTTCAATGGAACGGAAATATGTAAAAATGTCA 
6500 .... 
ThrValGlnCysThrHisGly I 1 eLysProVa 1 Va 1 SerThrG InLeuLeuLeuAsnG ly 
GTACAGTACAATGTACACATGGAATTAAGCCAGTGGTGTCAACTCAACTGCTGTTAAATG 

6600 

SerLeuAlaG luGluG lull eMe 1 I LeArgS erG luAsnLeuThrAspAsnThrLy sAsn 
GCAGTCTAGCAGAAGAAGAGATAATGATTAGATCTGAAAATCTCACAGACAATACTAAAA 

• • ■ • • • 
IlelleValGlnLeuAsnG luThr ValThr I 1 eAsnCys Thr ArgProG lyAsuAsaThr 

ACATAATAGTACAGCTTAATGAAACTGTAACAATTAATTGTACAAGGCCTGGAAACAATA 

6700 

ArgArgGlylleEisPheGlyProGlyGlnAlaLeuTyrThrThrGlylleValGlyAsp 
CAAGAAG AGGGATACATTTCGGCCCAGGGCAAGCACTCTATACAACAGGGATAGTAGGAG 

■ • • • . • 
IleArgArgAlaTyrCysThr I 1 eAs nG luThrG luTrpAs pLy sThr LeuG InG In Va 1 

ATATAAG AAGAGCATATTGTACTATTAATGAAACAGAATGGGATAAAACTTTACAACAG G 
6800 .... 
AlaValLy.sLeuG ly SerLeuLeuAsnLy sThrLy s lie 1 1 ePheAsnSerSerSerGly. 

TAGCTGTAAAACTAGGAAGCCTTCTTAACAAAAGAAAAATAATTTTTAATTCATCCTCAG 

• 6900 
GlyAspProGluIleThrThrHisSerPheAsnCysArgGlyGluPhePheTyrCysAsn 
GAGGGGACCCAGAAATTACAACACACAGTTTTAATtGTAGAGGGGAATTTTTCTACTGTA 

• • • » • • 
Thr SerLysLeuPheAsnSerThrTrpG lnAsnAsnG lyAlaArgLeuSerAsaSerThr 

ATACATCAAAACTGTTTAATAGTACATGGCAGAATAAIGGTGCAAG ACTAAGTAATAGCA 

7000 

GluSerThrGlySerlleThrLeuProCysArglleLysGlnllelleAsnlletTrpGln 
CAGAGTCAACTGGTAGTATCACACTCCCATGCAGAATAAAACAAATTATAAATATGTGGC 

• • • * « . 
LvsThrGiyLysAlaHetTyrAlaProProIleAlaGly VallleAsaCy sLeuSerAsn 

AG AAAACAGGAAAAGCTATGTATGCCCCTCC CATCGCAGGAGTCATCAACTGTTTATCAA 
7100 .... 

IleThrGlyLeuIleLeuThrArgAspGlyGlyAsaSerSerAspAsnSerAspAsnGlu 
ATATTACAGGGCTGATATTAACAAGAGATGGTGGAAATAGTAGTG ACAATAGTG ACAATG 

7 200 
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ThrLeuArgProGlyGlyGlyAspMecArgAspAsnTrpIleSerGluLeuTyrLysTyr 
AGACCTTAAGACCTGGAGGAGGAGATATGAGGGACAATTGGATAAGTGAATTATATAAAT 

LysValValArglleGluProLeuGlyValAUProThrLysAULysArgAr^ValVal 

ATAAAGTAGTAAGAATTGAACCCCTAGGAGTAGCACCCACCAAGGCAAAGAGAAGAGTGG 

7300 

GluArgGluLysArgAlaIleGlyLeuGlyAlaMetPheLeuGlyPheL : euGlyAlaAU 
TGGAAAGAGAAAAAAGAGCAATAGGACTAGGAGCCATGTTCCTTGGGTTCTTGGGAGCAG 
* 

GlySerThrlietGlyAlaAlaSerLeuThrLeuThrValGlaAlaArgGlnLeuLeuSer 

CAGGAAGCACGATGGGCGCAGCGTCACTAACGCTGACGGTACAGGCCAGACAGTTACTGT 
7400 .... 

GlylleValGlnGlnGlaAsnAsnLeuLeuArgAlalleGluAlaGlnGlnEisLeuLeu 
CTGGTATAGTGCAACAGCAAAACAATTTGCTGAGGGCTATAGAGGCGCAACAGCATCTGT 

7500 

GlnLeuThrValTrpGlylleLysGlnLeuGlnAlaArgValLauAlaValGluArc-Tyr 
TGCAACTCACGGTCTGGGGCATTAAACAGCTCCAGGCAAGAGTCCTGGCTGTGGAAAGAT 
* 

LeuGlnAspGlnArgLeuLeuGlyMetTrpGlyCysSerGlyLysHisIleCysThrThr 
ACCTACAGGATCAACGGCTCCTAGGAATGTGGGGTTGCTCTGGAAAACACATTTGCACCA 

. 7600 

r J5^!^^° TrpAsnSerSerTr P SerA snArgSerLeuAspAspIleTrpAsnAsnMet 
CATTTGTGCCTTGGAACTCTAGTTGGAGTAATAGATCTCTAGATGACATTTGGAATAATA 
* 

ThrTrpHetGlnXrpGluLysGluIUSerAsnTyrThrGlyllelleTyrAsnLeuIla 

TGACCTGGATGCAGTGGGAAAAAGAAATTAGCAATTACACAGGCATAATATACAACTTAA 
77 00 

GluGl.uSerGlnlleGlnGlnGluLysAsnGluLysGluLeuLeuGluLeuAspLysTrp 
TTGAAGAATCGCAAATCCAGCAAGAAAAGAATGAAAAGGAATTATTGGAATTGGACAAGT 

7800 

AlaSerLeuTrpAsnTrpPheSerlleSerLysTrpLeuTrpTyrlleArgllePhelle 
GGGCAAGTTTGTGGAATTGGTTTAGCATATCAAAATGGCTGTGGTATATAAGAATATTCA 
* 

IleValValGlyGlyLeuIleGlyLeuArgllellePheAlaValLeuSerLeuVa-lAsn 

TAATAGTAGTAGGAGGCTTAATAGGTTTAAGAATAATTTTTGCTGTGCTTTCTTTAGTAA 

7900 

ArgValArgGlnGlyTyvSerProLeuSerLeuGlaThrLeuLeuProThrProAroGlv 
ATAGAGTTAGGCAGGGATACTCACCTCTGTCGTTGCAGACCCTCCTCCCAACACCGAGGG 

• - . . 

ProProAspArgProGluGlylleGluGluGlud'lyGlyGluGlnGlyAr-GlvA-aSer 

GACCACCCGACAGGCCCGAAGGAATAGAAGAAGAAGGTGGAG AGCAAGGCAGAGGCAGAT .. 
« o0G0 . 9 

r .Ji^!^"I alAsnG1 y ?heSerAlaL ^neTrpA S pA S pLeuArg'AsnLeuCysLeu 
CuAT i CCr. i aGGTGAACGGATTCTCAGCACT.TATCTGGGACG ACCTGAGGAACCTGTGCC 

C100 

PheSerTyrKisArgLeuArgAspLsuLeuLeuIleAlaThrArglleValGluLeuLeu 
TCTTCAGTTACCACCGCTTGAGAGACTTACTCTTAATTGCAACGAGGATTGTGGAACTTC 

GlyArsArgGlyTrpGluAlaLsiiLysTyrLeuTrpAsnLeviLeuGlnTyrTrpGlyG-in- 

tgggacgcagggggtgggaagccctcaaatatctgtggaatctcctgcaatattggggtc" 

8200 
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.^5}?^iJ LysAsuSerAlaIleSerLeuLeuAsnThrTh rAlaIleAlaValAlaGluCys 
AGGAACTGAAGAATAGTGCTATTAGCTTGCTTAATACCACAGCAATAGCAGTAGCTGAAT 
* 

ThrAspArgVall.leGluIleGlyGlnArgPheGlyArgAlalleLeuEisIleProArg 

GCACAGATAGGGTTATAGAAATAGGACAAAGATTTGGTAGAGCTATTCTCCACATACCTA 
8300 . p . : 

EW <fr-< (MetGlyGlyLysTrpSerLys 
ArglleArgGlnGlyPheGluArgAlaLeuLeu 

GAAGAATTAGACAGGGCTTCGAAAGGGCTTTGCTAjrAAOATGGGTGGCAAGTGGTCAAAA 

• .840 0 

SerSerlleValGlyTrpProLysIUArgGluArglleArgArgThrProProThrGlu 
AGTAGCATAGTAGGATGGCCTAAGATTAGGGAAAGAATAAGACGAACTCCCCCAACAGAA 

ThrGlyValGlyAlaValSerGlnAspAlaValSerGlnAspLeuAs|>LysCysGlyAla 
ACAGGAGTAGGAGCAGTATCTCAAGATGCAGTATCTCAAGATTTAGATAAATGTGGAGCA 

8500 

AlaAlaSerSerSerProAlaAlaAsnAsnAlaSerCysGluProProGluGluGluGlu 
GCCGCAAGCAGCAGTCCAGCAGCTAATAATGCTAGTTGTGAACCACCAGAAGAAGAGGAG 
* 

GluValGlyPheProValArgProGlnValProLeuArgProMetThrTyrLysGlyAla 
GAGGTAGGCTTTCCAGTCCGTCCTCAGGTACCTTTAAGACC AATGACTTATAAAGGAGCT 

8600 • ^U3 . ■ • 

Phf.AspLeuSerHisPheLeuLysGluLysGlyGlyLeuAspGlyLeuValTrpSerPro 
TTTGATCTCAGCCACTTTTTAAAAGAAAAGGGGGcjACTGGATGGGTTAGTTTGGTCCCCA 

. . 37 00 

LysArgGlnGluIleLeuAspLeuTrpValTyrHisThrGlnGlyTyrPheProAspTrp 
AAAAGACAAGAAATCCTTGATCTGTGGGTCTACCACACACAAGGCTACTTCCCTGATTGG 
* 

GlnAsnTyrThrProGlyProGlylleArgPheProLeuThrPheGlyTrpCysPheLys 

CAGAATTACACACCAGGGCCAGGGATTAGATTCCCACTGACC TTCGGATGGTGCITTAAG 

S800 

LeuValProKetSerProGluGluValGluGluAlaAsnGluGlvGluAsnAsnCysLeu 
TTAGTACCAATGAGTCCAGAGGAAGTAGAGGAGGCCAATGAAGGAGAGAACAACTGTCTG 

LeuhisProIleSerGlnEisGlyiletGluAspAlaGluArgGluV&lLeuLysTroLys 
TTACACCCTATTAGCCAACATG GAATGGAGGACGCAGAAAGAGAAG TGCTAAAATGGr AG 
8900 . . 

PheAspSerSerLeuAlaLeuArgHisArgAlaAr^GluGlnEisProGluTyrTyrLvs 

tttgacagcagcctagcactaagacacagagccagZgaacaacatccggagtactacaaa 

F *— l ; . . . 9000 

AspCy s| 

GACTGcjrGACACAGAAGTTGCTGACAGGGGACTTTCCGCTGGGGACTTTCCAGGGGAGGC 

GTAACTTGGGCGGGACCGGGGAGTGGCTAACCCTCAGATGCTGCATATAAGCAGCTGC-T 
JJ3e-r^R . . 9100 

TTCGCCTGTACTGjGGTCTCTCTTGTTAGACCAGGTCGAGCCCGGGAGCTCTCTGGCTAGC 
* 

AAGGAACCCACTGCTTAAGCCTCAATAAAGCTTGCCTTGAGTGCCTCAA 
9200 
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